Common newbie mistakes or bad practices

Another hang-over from writing a lot of C is using the wrong integer type and getting frustrated because you need to cast to/from usize all the time because those integers were actually used as indices.

12 Likes

I actually think std should support indexing slices and arrays with any unsigned integral primitive type (and I'm not alone, this has been mooted repeatedly on IRLO), so I sympathize with the beginners on this one :‌P

10 Likes

I would extend this to using args() or vars() vs. their non-panicking counterparts, and it's endemic in the wider community, not just beginners. Implicitly encouraged, even.

Others:

  • Using unsafe to just do what you would have in C because you "know it works"
  • Using getters and setters within the crate
  • Expecting dyn Trait to sub-type or be dynamically typed
11 Likes
  • Traits prefixed with I. Types and traits shouldn't be easily confused in sensible code; they don't need a marked naming convention.
  • Matched trait-struct "pairs". Often combined with the above: impl IFoo for Foo. It won't work out the way you want. (Also you probably shouldn't write this in C# either but w/e)
  • Pervasive use of getters and setters. I haven't seen too much of this recently.
  • Manually uniquefying all identifiers (lib_module_struct_method) instead of just using built-in namespaces (lib::module::Struct::method). It's what they're there for!
  • Using #[path] ever.
12 Likes

I get the sense that newcomers to Rust tend to overuse traits in general, when you can get by most of the time defining only ADTs (and using other people's traits). Maybe this is the imprint of object-oriented languages where classes can be both containers for data and abstract interfaces? (Makes me a little apprehensive about the "field in trait" proposals you sometimes see floated.)

7 Likes

7 posts were split to a new topic: Path, OsStr, and supporting non-UTF-8 paths/inputs

Using byte indexing into UTF-8 strings without awareness of char boundaries etc. and the resulting panics.

12 Likes

I think I've only used the #[path] attribute once in legitimate code, and that was because I wanted to share some type definitions between a crate and its helper proc macro without creating a third "common" crate that both depend on.

That's a good point. Although I find the OsStr versions not overly useful because you need to turn it into a String if you want to use it for anything other than a file name.

So your options are to either ignore the non-UTF8 OsStr, panic, bail with an error, or silently corrupt the input by substituting �. Not ideal.

1 Like
  • Measuring performance of unoptimized/debug builds.
  • Using an empty Vec<u8> as the buffer argument when calling Read::read.
  • Calling Read::read or Write::write when read_exact or write_all is needed. (Clippy has a lint for this.)
  • Unnecessarily converting String into something like Vec<char> (for use cases that could be rewritten to avoid char indices).
  • Trying to do an early return from a function without using the return keyword, by just removing a semicolon. (Fortunately, this usually causes a compiler error.)
16 Likes

Putting IO resources inside mutexes.

5 Likes

Question... Why is the example where you initialized a vec with Vec::new() and the populating later a bad thing?

Thanks for posting this, I really like discussions like this. Helps with better understanding the language and becoming a better programmer.

1 Like

I guess it's more of a style thing, but here's an example.

Say you are writing a NLP application and have a dictionary containing all the possible words you can handle. This is one way you could create the dictionary:

let mut dict = Dictionary::new();
// read the file and populate some internal HashMap or Vec
dict.load_from_file("./words.txt")?; 

This means my Dictionary now has two (hidden) states - empty and populated.

All downstream code that uses the Dictionary will assume it's been populated already and write code accordingly. This may include doing things like indexing into the dictionary with dict["word"] which may panic if "word" isn't there.

Now you've opened yourself up to a situation where passing an empty dictionary to code that expects a populated dictionary may trigger a panic.

But that's completely unnecessary. Just make sure the Dictionary is usable immediately after constructing it instead of populating it after the fact.

let dict = Dictionary::from_file("./words.txt")?;

impl Dictionary {
  fn from_file(filename: impl AsRef<Path>) -> Result<Self, Error> {
    let text = std::fs::read_to_string(filename)?;
    let mut words = Vec::new();
    for line in text.lines() {
      words.push(line);
    }
    Ok(Dictionary { words })
  }
}

Internally the Dictionary::from_file() might create an empty Vec and populate it incrementally, but it won't be stored in the Dictionary's words field yet so there is no assumption that it is populated and useful.

18 Likes

Oh wow, I had never considered what you said about two hidden states. `Interesting...Well, now I know. I cant necessarily pull an example right off the top of my head at the moment, but I do know I have been in situations before where I felt initializing with Vec::new() was indeed necessary, so I guess now the question is, where in my code did i put myself in that situation...

I think it also depends on your background or coding style.

FP languages are often completely immutable so you'll fall into this pattern naturally. It's kinda hard to create a half-initialized thing and populate it later when you aren't allowed to mutate anything.

On the other hand, OO languages are much happier to let you initialize an object after it has been constructed, especially because object references can be null by default and they have no qualms about mutability... You could argue this contributes to why OO languages have a propensity for crashing due to an unexpected NullPointerException.

Don't take this all as gospel, though. It's just a way of coding, and there are plenty of places where it's the wrong way to code (e.g. because it unnecessarily constrains your code or information just isn't available until a later part of the program).

4 Likes

Also

  • Attempting self-referencial structs
  • Long-lived reference-based structs generally
6 Likes

Some additional ones I have seen:

  • In a CLI program, intermingling I/O, argument parsing, etc. with actual domain logic. Domain logic should have its own module or even crate for clean re-usability.
  • Again, w.r.t. CLI, parsing arguments manually, or even trying to configure clap manually, instead of using structopt to create a strongly-typed input/config struct.
  • Thinking of error handling as an "afterthought" or as some additional annoyance instead of designing with Result and ?-bubbling upfront.
  • Overusing slice indexing when iterators would be cleaner/faster, or the converse, overusing iterators where indexing would be easier on borrowck.
8 Likes
  • Path::to_str().unwrap() instead of Path::display() to put a path into an error message
  • Trait over-use: implementing From for something which could have been a named factory function; introducing a trait where a bunch of inherent methods would do.
  • Trait miss-use: implementing TryFrom<&str> instead of FromStr.
  • Unnecessary .unwraps: if opt.is_some() { opt.unwrap() }.
  • Generic bloat: pub fn do_something(path: impl AsRef<Path>) { 200 line function to monomorphise in every crate }
  • Cyclic dependencies between crates (when a leaf crate have a dev-dependency on the root crate)
  • Somewhat arbitrary splitting of code into crates in general.
  • Error management. Rust has the tools to implement the best possible error management of any language, but there's no pit of success there. One stable state is a giant enum which combines errors from different subsystems, has an Other(String) variant just in case, and which is used for basically anything.
14 Likes

Can you elaborate on why this might be a mistake/bad practice?

I'm guessing it might be because now you've got multiple pieces of code all trying to write to the same thing instead of going through a common abstraction/mediator?

1 Like

@Michael-F-Bryan The context is something I see people attempt rather often and then ask questions about in the Tokio discord. They are writing e.g. a chat server, and then they define a map that looks something like this:

HashMap<UserId, TcpStream>

but then they run into trouble with this because they want to be able to read from every IO resource at the same time, and they wrap it in a mutex. This pretty much always ends up falling apart. It is common enough that I wrote the article Actors with Tokio so I could link people making this mistake to an article explaining what they should do instead. (I.e. they should instead spawn an actor per connection, then put actor handles in the hash map.)

To be fair, there are some valid reasons to put IO resources in mutexes. For example, you could put a single writer into the mutex and have threads take turns writing stuff to it. Database pools also do it.

Perhaps a better description of the mistake is "putting IO resources in collections" or "trying to have a single async task manage a variable number of IO resources".

7 Likes

An addendum to this: relying on some IDE plugin's mangling of the error message rather than the actual rustc output.

rant

I have every respect for the people who develop the various IDE tools, and this is not meant to bash on IDEs or people who use them. But this mistake happens daily. I used to think that IDEs were fine in their own way but just never liked them much myself; now, though, I'm increasingly of the opinion that one needs at least a basic understanding of the command line tools before graduating to a pushbutton GUI. This shift in opinion is almost entirely due to the endless stream of people who post screenshots in which rustc's beautiful, colorized error message with ASCII-art arrows pointing at relevant sections of text and helpful suggestions has been stripped of its formatting, reflowed, truncated, and stuffed into a pop-up in what I usually assume is VS Code.

Perhaps for languages that have less helpful error messages - which is most languages - this is not a problem. For people who already know to check the terminal output and not just point at things with the mouse cursor, again, not a problem. But you don't have to spend long in these forums or Discord to realize that for a lot of people the IDE is just getting in the way.

16 Likes