Common newbie mistakes or bad practices

  • Measuring performance of unoptimized/debug builds.
  • Using an empty Vec<u8> as the buffer argument when calling Read::read.
  • Calling Read::read or Write::write when read_exact or write_all is needed. (Clippy has a lint for this.)
  • Unnecessarily converting String into something like Vec<char> (for use cases that could be rewritten to avoid char indices).
  • Trying to do an early return from a function without using the return keyword, by just removing a semicolon. (Fortunately, this usually causes a compiler error.)
16 Likes

Putting IO resources inside mutexes.

5 Likes

Question... Why is the example where you initialized a vec with Vec::new() and the populating later a bad thing?

Thanks for posting this, I really like discussions like this. Helps with better understanding the language and becoming a better programmer.

1 Like

I guess it's more of a style thing, but here's an example.

Say you are writing a NLP application and have a dictionary containing all the possible words you can handle. This is one way you could create the dictionary:

let mut dict = Dictionary::new();
// read the file and populate some internal HashMap or Vec
dict.load_from_file("./words.txt")?; 

This means my Dictionary now has two (hidden) states - empty and populated.

All downstream code that uses the Dictionary will assume it's been populated already and write code accordingly. This may include doing things like indexing into the dictionary with dict["word"] which may panic if "word" isn't there.

Now you've opened yourself up to a situation where passing an empty dictionary to code that expects a populated dictionary may trigger a panic.

But that's completely unnecessary. Just make sure the Dictionary is usable immediately after constructing it instead of populating it after the fact.

let dict = Dictionary::from_file("./words.txt")?;

impl Dictionary {
  fn from_file(filename: impl AsRef<Path>) -> Result<Self, Error> {
    let text = std::fs::read_to_string(filename)?;
    let mut words = Vec::new();
    for line in text.lines() {
      words.push(line);
    }
    Ok(Dictionary { words })
  }
}

Internally the Dictionary::from_file() might create an empty Vec and populate it incrementally, but it won't be stored in the Dictionary's words field yet so there is no assumption that it is populated and useful.

18 Likes

Oh wow, I had never considered what you said about two hidden states. `Interesting...Well, now I know. I cant necessarily pull an example right off the top of my head at the moment, but I do know I have been in situations before where I felt initializing with Vec::new() was indeed necessary, so I guess now the question is, where in my code did i put myself in that situation...

I think it also depends on your background or coding style.

FP languages are often completely immutable so you'll fall into this pattern naturally. It's kinda hard to create a half-initialized thing and populate it later when you aren't allowed to mutate anything.

On the other hand, OO languages are much happier to let you initialize an object after it has been constructed, especially because object references can be null by default and they have no qualms about mutability... You could argue this contributes to why OO languages have a propensity for crashing due to an unexpected NullPointerException.

Don't take this all as gospel, though. It's just a way of coding, and there are plenty of places where it's the wrong way to code (e.g. because it unnecessarily constrains your code or information just isn't available until a later part of the program).

4 Likes

Also

  • Attempting self-referencial structs
  • Long-lived reference-based structs generally
6 Likes

Some additional ones I have seen:

  • In a CLI program, intermingling I/O, argument parsing, etc. with actual domain logic. Domain logic should have its own module or even crate for clean re-usability.
  • Again, w.r.t. CLI, parsing arguments manually, or even trying to configure clap manually, instead of using structopt to create a strongly-typed input/config struct.
  • Thinking of error handling as an "afterthought" or as some additional annoyance instead of designing with Result and ?-bubbling upfront.
  • Overusing slice indexing when iterators would be cleaner/faster, or the converse, overusing iterators where indexing would be easier on borrowck.
8 Likes
  • Path::to_str().unwrap() instead of Path::display() to put a path into an error message
  • Trait over-use: implementing From for something which could have been a named factory function; introducing a trait where a bunch of inherent methods would do.
  • Trait miss-use: implementing TryFrom<&str> instead of FromStr.
  • Unnecessary .unwraps: if opt.is_some() { opt.unwrap() }.
  • Generic bloat: pub fn do_something(path: impl AsRef<Path>) { 200 line function to monomorphise in every crate }
  • Cyclic dependencies between crates (when a leaf crate have a dev-dependency on the root crate)
  • Somewhat arbitrary splitting of code into crates in general.
  • Error management. Rust has the tools to implement the best possible error management of any language, but there's no pit of success there. One stable state is a giant enum which combines errors from different subsystems, has an Other(String) variant just in case, and which is used for basically anything.
14 Likes

Can you elaborate on why this might be a mistake/bad practice?

I'm guessing it might be because now you've got multiple pieces of code all trying to write to the same thing instead of going through a common abstraction/mediator?

1 Like

@Michael-F-Bryan The context is something I see people attempt rather often and then ask questions about in the Tokio discord. They are writing e.g. a chat server, and then they define a map that looks something like this:

HashMap<UserId, TcpStream>

but then they run into trouble with this because they want to be able to read from every IO resource at the same time, and they wrap it in a mutex. This pretty much always ends up falling apart. It is common enough that I wrote the article Actors with Tokio so I could link people making this mistake to an article explaining what they should do instead. (I.e. they should instead spawn an actor per connection, then put actor handles in the hash map.)

To be fair, there are some valid reasons to put IO resources in mutexes. For example, you could put a single writer into the mutex and have threads take turns writing stuff to it. Database pools also do it.

Perhaps a better description of the mistake is "putting IO resources in collections" or "trying to have a single async task manage a variable number of IO resources".

7 Likes

An addendum to this: relying on some IDE plugin's mangling of the error message rather than the actual rustc output.

rant

I have every respect for the people who develop the various IDE tools, and this is not meant to bash on IDEs or people who use them. But this mistake happens daily. I used to think that IDEs were fine in their own way but just never liked them much myself; now, though, I'm increasingly of the opinion that one needs at least a basic understanding of the command line tools before graduating to a pushbutton GUI. This shift in opinion is almost entirely due to the endless stream of people who post screenshots in which rustc's beautiful, colorized error message with ASCII-art arrows pointing at relevant sections of text and helpful suggestions has been stripped of its formatting, reflowed, truncated, and stuffed into a pop-up in what I usually assume is VS Code.

Perhaps for languages that have less helpful error messages - which is most languages - this is not a problem. For people who already know to check the terminal output and not just point at things with the mouse cursor, again, not a problem. But you don't have to spend long in these forums or Discord to realize that for a lot of people the IDE is just getting in the way.

16 Likes

This. I come from a university where the norm for teaching beginner programming classes in C++ is "use this IDE, press the green arrow, and you are good to go". The official advice for depending on external libraries is not to learn the usage of the linker; it is instead a pre-built zip file with an empty project in it that has been configured by the lecturer using the arcane options in the GUI to link against that library, and which you have to unzip again and again when you want to start a new project.

Abstraction is nice, but I don't believe anyone can realistically hope to use any sort of tool without at least a basic understanding of the level of abstraction immediately below it.


As an aside, the resource consumption of modern IDEs just infuriates me to the extreme. The latest version of Xcode for instance is more than 10 Gigabytes, compressed. It also consumes ridiculous amounts of RAM for no good reason. In comparison, I use Vim as a fully-featured IDE, and it was something like 40 Megabytes last time I checked.

12 Likes

On the other hand, when people read rustc output, they frequently seem to skip over the main error: message and look only at the secondary messages — probably because they are more emphasized by the whitespace and ASCII art.

9 Likes

Which is clearly demonstrated by how common we’re getting users in this forum that post excepts from error messages which don’t contain the main error.

7 Likes

Exactly. Perhaps there could be some kind of experiment with adjusting rustc's output to make the main message be as emphasized as the secondary ones, somehow.

2 Likes

I think this would make everyone scream in horror but if the error order could be altered so that the first (often "most immediately actionable") one came last, then it would be read first in most terminals.

4 Likes

This is a bit more advanced, but it's often helpful in larger architectures to distinguish (as the type level) between runtime/recoverable errors and critical/unrecoverable errors. So I guess "having a single error type for everything" may be bad practice

1 Like

Recovering from errors seems like a lot of work at times, I have been leaving onus onto users to try again if something is too complex to recover from. I have been using these little fellows for all my error types(but maybe I lack experience to see benefits of different error types):

enum Language {
    English,
}

enum Message {
    ErrUnexpected,
}

impl Message {
    fn to_string(&self, lang: &Language) -> String {...}
}

enum CustomError {
    Message(Message),
    Messages(HashMap<String, CustomError>),
}

CustomError points to the place error occurs and may include additional steps to resolve it for the user in their own language. Its much nicer to see it in json form in postman.

It's certainly a complex topic but without getting into the weeds too much in this thread, consider a long-running user facing app: for any particular error, do you crash the entire app, or do you throw a message box at the user and keep going? So with this conceptual distinction then, at the type level (that is, this is more about dev ux than user ux): if a function failing means something is fundamentally broken, you a/ should indicate this to the caller somehow and b/ make it almost impossible to ignore that failure. You could panic and document the panic, but that means the caller can't catch that without defining a global handler or doing things with threads. Or you can return a Result<T, CritError>, which is different from a Result<T, NormalError>. Notably you can't ? a crit-fail function within a normal-fail function without explicitly handling that.

That particular pattern is a specialisation of a more general pattern of using typestates.

3 Likes