Rust 2020: Growth

Unicode is used for more than technical specifications. Many people, including on this forum, use emoji to convey emotional content or as shortcuts for long phrases that convey common sentiments. We use those emoji in e-mail and texts, and sometimes even in areas of more technical discussion such as here. :grin:

Unicode is an attempt to unify, if not overcome, the tower of Babel that is the set of forms of human written communication. I for one welcome it; I long ago tired of switching code pages just to communicate with colleagues who live elsewhere in the world. :clap:   [And yes, I can and do write in a number of languages.]

7 Likes

You know, the primary reason why Emoji were added to Unicode was because they started out as a unique selling point for DoCoMo's featurephones and wound up becoming an interop hazard.

1 Like

I may be older than I am young but I do get the idea. You may have noticed that even I will tack a smiley onto the end of a phrase. What I said was that emoji do not work as described.

Had I used a poo emoji when it was introduced as rendered by MS and Google, featuring stink lines and flies hovering above it, it would have been to convey disgust. Anyone reading that today would not get the same feeling seeing that cute happy little character as typically rendered. I don't like having my meaning changed by those conveying the message.

Those smiley I do use are to indicate a joke or that I'm not being entirely serious. Of course people often miss that get offended.

Unicode, with the hundreds of languages it now supports and the web it mostly inhabits is the tower of Babel. Fracturing rather than uniting. Thank goodness for Google translate.

I don't want to argue against that. It's a different argument though.

We know. Crazy is it not?

Hey, we are way off topic now...

As a newbie these different string types are very confusing.
I understand the difference between them, but find the way to differentiate them by different types a bit distracting.

Comparing String to &str, I would assume that String is actually a Python/Java/Julia String but it isn't, it's a kind of StringBuffer allowing to shrink and extend the char buffer whereas &str is what you expect from a "normal String Type".

And for OsString, OsStr, CString,CStr, can't this be solved with #repr tags over the str/String Type?
One #repr states the string is null terminated, the other it has many null termination in between, and the other specifies some padding bytes...

Many people have long maintained that String should have be called StringBuf (like PathBuf is) so you're in good company there. It's too late now though.

Os vs. C string difference is more complicated if you're new to low level languages but they're both to do with FFI in different ways. C strings are an array of bytes that cannot contain null but must be terminated with null. They're used when calling C functions and provide some safety.

Os strings are a bit of a misnomer* but the essential point is they are an array of bytes that can hold any string (after a conversion) that the OS may use. These strings can include nulls and may not be null terminated. Also unlike C strings these are opaque.

One problem with your repr idea is that repr only changes how something is represented in memory but C strings and Os strings are represented the same way (as an array of bytes). It's how you use them that's different. C strings have to keep the null on the end no matter what you add to them. They also need to prevent you inserting a null. Os strings don't but they do need to stop you reading their contents. Using an Os string to call a C function could be dangerous if the function expects it to be null terminated.


Aside: Os strings are better thought of as "Unix-compatible strings" or "raw C string". I'm only grumbling because I keep having to explain the difference to other Windows programmers and the name is a bit misleading tbh.

2 Likes

As far as I can tell strings are confusing in all programming languages.

C is about the simplest language there is but it has at least 4 different string types. You can't mutate a string literal but you can mutate other strings. Strings are made of char, which might be signed or not. That is four string types right there. The C99 standard provided for some wide characters support that was ill defined. If you really want to work with Unicode you need some external libraries.

Pascal started out with strings of bytes with a length byte at the beginning. Thus limiting string length. Later it got a String class.

C++ has all the C string types plus it's String class. Which ends up being a wrapper for an array of bytes. Again if you want to deal with Unicode you need help.

Javascript handles some subset of Unicode just fine. When you want to interface to the C world and such you have work to do.

I'm sure other languages have other ideas. For sure when you move from one to another it's confusing.

2 Likes

You are right concerning the idea with #repr and yes, the name OsString is misleading because it could mean everything why not just rename it to NonZeroString?

An why it all it needs to be fixed for each OS, the OS can decide if it will treat that kind of ubyte array as utf-8 or utf-16, am I missing something?

It's a bit sad that we can't unify each String/Str type into one trait, but that's how it is.

The trouble is when doing low level programming, the type of string you use is important. You can use what you like within your own program but as soon as you start calling OS APIs, or someone else's C functions, then you need to use an agreed format for the string. Otherwise you'd both interpret the same array differently.

For example, you can pass a UTF-8 encoded byte array to something that expects a u16 array but what happens if your UTF-8 array does not have an even length? They might read past the end. What happens if they were expecting null termination? They might read way past the end. What happens if they were expecting UTF-16? They'd read the string wrong.

So I do think it makes sense to have different strings for different purposes. The standard library doesn't always make it obvious when to use different strings but they do serve a purpose.

5 Likes

I understand your case, but afaict there is only one OsString type, not two, so updating it with wrong values or extending the array to uneven length may destroy irreversible the meaning of your data.

To be safe in this case would require some kind of refinement type by taken into account the runtime length of an array, I think.

[moderator note: the "unicode good"/"unicode bad" sidebar is off-topic. hid a couple posts.]

6 Likes

I just read the documentation at
https://doc.rust-lang.org/std/ffi/struct.CString.html
and it made me smirk
It explained you must use CString::as_ptr() which is something that I never expected. And it seems my fellow rust programmer either.
So it actually confirms that Rust is hard to learn because it introduces so many concepts that are complete new to Veteran Programmer.

About myself I do programming more than 20 years. I learnt Pascal as the first language, but during the years I made projects in C++, Java, C#, Delphi/FreePascal, PHP, Perl, Python, JavaScript. So I actually learnt a lot of new languages by the time.
But actually I think that is the problem with Rust. It wants to break free from conventions and by doing so it shocks with the Mindset of Veteran Programmers.
To overcome this you actually need Better Documentation and something like Rust Cookbook that helps to deal with tasks of day to day work.

CString is not the normal string type. It's for programmers that interact with C. There's only so much you can do to hide the complexity even in a high level language. For example, the Python equivalent is c_char_p for immutable strings and create_string_buffer for mutable strings.

Even with Python hiding some of the FFI magic you still have to use .value or .raw to see their contents in your program.

7 Likes

I also would like to add a note about the Community Attitude.
If you ask "Why Newbies don't want to come in?" you should not offend when "Rust Newbies" speak about their struggle.
Actually this shows some kind of Elite Mindset which does not define an Inclusive Community.
If you don't like the critic you should not put such a question ...
But actually you never will find out why Rust is not really a Success and it will always stay a little Margin Language like perhaps Assembler.

5 Likes

Hi all,

After reading your posts and also another discussion on the idea of monetizing crates.io . I asked myself : what does Mozilla say?
Some elements:
Rust is based on a community
It must remain as free as possible and not be owned by a small group or a major player
The stable,..., nightly model allows evolution's.

I appreciate that :slight_smile: in opposition to Go where Google decrees what will be implemented or not.

Vive Rust, Vive l'Open Source

1 Like

That seems a bit harsh.

I'm a newbie here too. I too struggle with a lot of aspects of Rust. Like the emphasis on verbose and unreadable functional programming style. That's before I get anywhere near macros!

Everyone here has been very tolerant of my ignorance and very patient and helpful.

I strongly suspect Rust will do very well with that community behind it.

10 Likes

Macros can be useful and it's worth learning how to use them. That doesn't mean they should be used everywhere.

Macros are interesting here because they're actually less complex in some situations than using type-foo to accomplish the same code reuse.

At least Rust macros are only compile-time macros and can't reach upward in scope. Metaprogramming downward in scope is fine. Metaprogramming upward is a mess.

So a productive discussion could ask:

  • Was Rust easy to learn?
  • if not, what makes it hard to learn?
  • What were you missing in order to make progress in Rust?
  • after you learned it, do you use it in production?
  • if not, what are Roadblocks to use it in production?
  • What is needed to overcome these Roadblocks in order to use it in production?
4 Likes

At the contrary the discussion could ask:

  • Was Rust easy to learn?
  • if yes, what made it ease to learn?
  • What has helped you most to progress in Rust?
  • Do you use it in production?
  • if yes, what do you use it for?
  • What are the most important features of Rust that helped you succeed in your business?

(Maybe I'm currently not a Rust pro, but) I completely disagreed with that. GAT is very important to me. In fact, I had to abandoned Rust for one of my hobbyist project and switch to Go because of it.

I have a rule when it comes to third-party component management: The core part of my code should not directly depend on third-party code, the dependency must be done through an interface layer (for Rust, it's Trait).

Take the project that I've abandoned for example, the file structure was like:

- src
   |
   +- thirdparty
   |      |
   |      + tokio // Adapter (Trait implementer) for Tokio
   |      + ....
   |
   +- application
   |      |
   |      + application.rs
   |      + network_io_traits.rs // Including all the traits which tokio adapter will have to implement
   |      + ...
   |
   +- main.rs

When applied that rule, all my code under /application folder should not mention anything outside of it (except for std::*, of course).

Usually, it play along very well through heavy use of Trait Associated Types. Because with Associated Types, you don't have to concrete traits types.

However, things can quickly go down to nightmare as soon as lifetime decide to coming to play, as currently you cannot declare lifetime for Associated Types. Which closed down many possibilities in terms of lifetime and structural management.

Also, async trait and impl Trait in trait fn return position is something too nice to be missed.

So, I'd say writing Rust without GAT is just like eating scrambled eggs without egg yolk, something is left to be satisfied.

2 Likes

Regarding the most obvious limitation on growth not noted above: first consider this clue from a comparision of Actix+rust and one of the much more popular frameworks like Laravel. Actix: Total downloads approximately 630,000. Amazing Yea Actix+Rust! Laravel PHP 10,000,000 downloads. That's great Larval, what amazing things PHP can do - despite being dozens of times worse in most metrics. From my perspective, why the beep would anyone even bother with Larva - steady and adequate and a commercial success I'll give you it is, but yeesh what a grubby sleepwalker compared to Rust's new hotness.

Maybe because they already do- sunk costs.

People are slow to observe. And then they are slow to decide. Once they have decided they are even slower change their minds and try anything better. Writing new code is slow. Mixing old & new code is slow. Even if rust grows 70% yearly for the next decade- it might still take almost 6 years for rust+actix to "catch up" to Laravel+PHP.