Need help thinking in types


#1

Hello,

first things first. My name is Saša (this is a male name, i’m from Europe) and this is my first post in this forum. I’m pretty new to Rust. I’ve been casually playing around with Rust for roughly about a year.

A little bit of background where i’m coming from (technically). I’m a software developer (went to university for CS education)working on embedded and systems software (embedded as in 8051 and ARM; systems as in networking and drivers). I’m looking back at about seventeen years of experience. As for programming languages, i’ve used quite some, to name a few:
C, C++, Basic, Python, Objective-C, Perl, Go, C#, Assembly. I’ve been playing around with OCaml, Haskell and Ada as well.

So before asking my question, i would like to name the things i understand and like about Rust:

  • Lifetimes
  • The borrow checker
  • Cargo

I was actually surprised about how fast i got past “fighting the borrow checker”, but with my background, this might be normal.
Now on to the point of this (already too long) post (thanks for reading this far :wink: ).

What i’m going to write now is going to be a bit of tongue in cheek and is just my way of expressing my own incompetence of understanding Rust completely.

So, my question would be: How can i learn to think types?
To be clear, i know the difference between dynamic and static typing, strong and weak or no typing. This is not the problem.

I will make up an example (probably a bad one, but i want to keep it simple and get my point across). Let’s say i have to write a function adding to numbers, let’s say in a range from one to ten. As i’m mostly a C guy, i would say “Well, uint8_t sounds good. At the end of the day it’s just two numbers between one and ten.”.
Let’s now assume i’m looking for a Rust lib that does just that (adding these numbers). As there is only one (for the sake of the argument :wink: ), i download it and look at it, but have no clue what is going on.

Why? Well, because it seems most Rust users seem to be way better at abstracting things than me.

What does that mean? Well instead of using a type that says something along the lines of “positive number from 0 to 255”,
the developer created a type for the numbers along the lines of
"<n-dimensional_Bojackle_Sphere_from_outer_space>::casual_fridays(maybe?)". And when i try to use the function (of course not correctly) the compiler tells me
" error: cannot find DeLorean::Flux_Capacitor::strange_glowing_thinggy in this scope
–> src/main.rs:15:19

As i said, i’m pretty much a C programmer. I think in data structures and keep the types a simple as possible. By the way, the same problems occur every time i have to use boost. The effect of all this is that i like Rust on a technical level, but can not imagine using it in a commercial setting because i cannot build a mental model of code (i.e. the flow and state).
Which, of course, is a shame because it’s not Rust’s fault, but mine.

So if this unstructured wall of text made any sense for you, and if you have an idea how i could learn to think in types instead of data structures, i would appreciate your help.

I will not be able to respond to answers for the next hours, so please do not think i’m a troll.

So thanks to everyone and see you later.

Saša


#2

I think the easiest way to “think in types” is to try and express any assumptions you’d typically make as part of the type system. That way it’s easy for the compiler to statically ensure various constraints and assumptions are upheld.

One way this is done is via the newtype pattern. The canonical example of this is creating Miles and Kilometers newtypes around a float so that it’s impossible to accidentally mess up units (RIP Mars Climate Orbiter). A real life example would be Duration from the standard library. Sure you could always represent time as an integer, but then does that integer represent a number of seconds, microseconds, or nanoseconds?

Another way you can use the compiler and strong type system to your advantage is to make invalid states unrepresentable. For example, a String is a bunch of bytes that must be valid UTF-8. You could just use Vec<u8> as your String type, but then it’s easy for people to add arbitrary bytes to the vector and break code which relies on that constraint. Instead, it’s physically impossible to create an invalid String because the only way of creating a String is via String::new() and adding characters with push() or String::from_utf8() which will return a Result which forces you to deal with the possibility of errors.

Likewise, you can represent ordering constraints (e.g. first you open a file, then you write to it, then you close it) in the type system using a concept called RAII. If you always need to do some cleanup after using a resource then create some sort of “handle” which you access the resource through, then add the cleanup code to the resource’s Drop impl. The canonical examples of this are File and MutexGuard from the standard library. The libloading crate also uses lifetimes when dynamically loading libraries to statically ensure a Symbol can’t be used after the library is unloaded.


The goal isn’t to make things overly complicated and bloated, instead you’re trying to use the compiler to help prevent errors ever happening. A result of this is that in Rust usually if your code compiles then it’s correct. I’ve been using Rust both in my spare time and at work for over a year now and I don’t think I’ve ever needed to touch a debugger or spend hours tracking down subtle bugs. You have no idea how nice that is considering it’s not uncommon to spend hours in a debugger tracking down a segfault when working on C/C++ code!

Sorry for the wall of words! Let me know if any of that doesn’t make sense, but hopefully I’ve added enough links to other resources to point you in the right direction when “thinking in types” and using the type system to prevent bugs from creeping into your code in the first place.


#3

Wow, thanks for the quick and detailed answer. I perfectly understand what you are saying.
As mentioned in my opening post, i’ve been using C++ with Boost for quite some time too, so i do know about all those concepts and what they used for.

I might not have described my problem well enough, so sorry i wasted your time. My problem is creating a mental model of a running program written in Rust (or for that matter, any strongly typed language). I can easily create a mental model of a running program written in C, Objective-C or Go (these are just examples because those languages don’t have as powerful type systems as Rust). If something goes wrong at runtime (talking about the logic flow here, not segfaults), i can pretty easily and roughly spot the culprit. Same goes if i get compiler errors for those languages.

If something goes wrong in Rust at compile or runtime, i’m pretty much lost. Especially at compile time,because sometimes
the types are so deeply nested that i completely loose sight. So if you encounter an error at compile time (or runtime) because of complex typing issues, how do start solving the error in your mind?

I hope this post makes sense, because english is not my primary language. Again, thanks for taking the time.


#4

It might be helpful to walk through an example or two that you encountered.

Rust is particularly disorienting at the beginning because of heavy type inference use - someone familiar with the language will parse things in their head but a beginner will have very little to go by until they get comfortable with the type system, compiler error messages, common stdlib APIs, and so on. But, it gets easier over time as familiarity settles in.


#5

Can you give us an example of where you’ve encountered this in the past? You don’t typically get heavily nested and confusing types in most day-to-day Rust. I think the most nested type you’d typically encounter might look something like Arc<Mutex<Vec<u8>>>.

One exception to this is futures because each consecutive operation essentially wraps the previous future, but impl Trait is doing a lot to fix that.

The languages you mention as having previous experience in are all primarily imperative, whereas Rust has a lot of functional programming aspects to it. Could the issues you’re encountering be due to the different programming paradigms used (e.g. using filter() and map() instead of for and if)?


#6

I’m with the others in thinking that you need somebody to help guide you through specific examples!

General guidance is tough, but I’ll try:


My bit of advice for now is the following: One thing to be wary of is that, often times, an explosion of types in an API is part of a workaround for missing language features… though some might call these “design patterns.” :stuck_out_tongue: It’s useful to recognize these cases so that you can likewise learn not to focus on them too hard when you see them!

Example 1: Iterator adapters. (often a workaround for the lack of existential types)

Let’s write a function that doubles numbers!

fn double(v: Vec<u32>) -> Vec<u32>
{ v.into_iter().map(|x| x * 2).collect() }

Cool! But then one day we find we want to do other things with the numbers after doubling them, such as filtering out multiples of 42. So we change our function to work on iterators instead:

fn double<I>(iter: I) -> ?????
where I: IntoIterator<Item=u32>
{ iter.into_iter().map(|x| x * 2) }

The problem is it’s currently impossible to express the type of iter.map(|x| x*2). The options available are:

  • Changing the output to Box<Iterator<Item=u32>>, which uses dynamic polymorphism and potentially throws away essential type information like Send and 'static.
  • Defining your own type:
    pub struct Double<I>(I);
    
    impl <I:Iterator<Item=u32>> Iterator for Double<I> {
        type Item = u32;
        fn next(&mut self) -> Option<u32> { self.0.next().map(|x| 2 * x) }
    }
    
    pub fn double<I: IntoIterator<Item=u32>>(it: I) -> Double<I::IntoIter>
    { Double(it.into_iter()) }
    
    This is what’s known as an iterator adapter, and is the approach taken in the standard library for functions like cloned. There are cases where iterators have additional methods on them, but in many cases the fact that these adapters exist as distinct types are really just an implementation detail. (Just treat them as any iterator)

Example 2: Type-level integers and type-level programming

nalgebra has 128 types named U0U128 representing positive integers. Why? Presumably these are around to help statically check e.g. dimensions for matrix multiplication and such. (I dunno, I haven’t used the library in ages!)

These are necessary because rust unfortunately still does not have true generic constants.

The important thing to do when you see types like these is to look at their rustdoc and see what traits they implement. Here we see Dim, DimName, and IsNotStaticOne. These traits are probably not implemented for anything else (of course, it’s good to check for yourself), so that can help mentally filter out some of the noise in type signatures.

But wherever there’s type level integers, there will also be type-level programming. Let’s peck around a bit…

…yep, there it is. Here’s the type-level equivalent to n * m. There are three types involved here, Self (that’s n), D (that’s m), and an associated type Output for the result. (the implementation is so short because it delegates to the typenum crate to compute the associated type)

Whenever you see a trait that has virtually no contents other than an associated type (I’m ignoring the mul function on that trait which is trivial and returns a constant), it’s probably used for type-level programming.

Now here’s an important fact: Nobody thinks that type-level programming is easier to reason about than regular programming. Why? Well, for starters, the type system is an untyped programming language! It checks what properties it can, but this is nothing compared to what it can check on actual code.

This comes with an important corollary: Whenever you see type level programming, you should probably stop at looking function signatures and start looking for examples. The author will have provided examples, because deep down inside they’re unhappy that their public API is full of functions with where clauses that are 15 lines long with stuff like <M1 as WithElemType<<M2 as WithElemType<B>>::Output>::Output, and they want you to be able to just see how it is used.


#7

So, i’m back and amazed and the great replies. Thanks to all of you!

@vitalyd:
Yes, this has been my gut feeling. As mentioned, i’m using Rust just for personal projects right now, which means just occasionally. Maybe this contributes to my problem.

@ExpHP and Michael-F-Bryan:
I don’t think my problems stem from the functional style. As mentioned, i’m actually pretty comfortable with Haskell.
Instead of coming up with a new example i will use the examples provided by ExpHP :slightly_smiling_face:.

Let’s start with example 1. Here you wrote two functions, the first one taking a Vec as parameter (for some reason the preview of the forum editor does not show the u32 part), the second one taking an iterator. Why does the second function need an iterator? Why not a Vec again and get the iterator by calling into_iter()? Why do i need to call into_iter() on iter? Is it not of type iterator? I would think (from the first look) that it is because I seems to be of type Iterator created from a item of type u32?

On to example 2, which actually is a perfect example of the problem i encounter. Let’s say i want to use nalgebra.
I try to (wrongly) use a function and get a compiler error because some trait is not implemented. So i I have look at the docs for the trait nalgebra::core::dimension::Dim at http://nalgebra.org/rustdoc/nalgebra/core/dimension/trait.Dim.html.

At this point i’m lost because i have no idea how to parse this, let alone understand what it means:

impl<A: Bit + Any + Debug + Copy + PartialEq + Send, B: Bit + Any + Debug + Copy + PartialEq + Send, C: Bit + Any + Debug + Copy + PartialEq + Send, D: Bit + Any + Debug + Copy + PartialEq + Send, E: Bit + Any + Debug + Copy + PartialEq + Send, F: Bit + Any + Debug + Copy + PartialEq + Send, G: Bit + Any + Debug + Copy + PartialEq + Send> Dim for UInt<UInt<UInt<UInt<UInt<UInt<UInt<UInt<UTerm, B1>, A>, B>, C>, D>, E>, F>, G>
impl<U: Unsigned + DimName, B: Bit + Any + Debug + Copy + PartialEq + Send> Dim for UInt<U, B>

The following questions immediately arise:

  • What does that mean?
  • Do i need it?
  • Do i have to understand it?

Those kind of signatures are a good example of what i mean with “deeply nested” too. So i go on and assume i do not need to understand that and have a look at trait nalgebra::core::dimension::DimMul (http://nalgebra.org/rustdoc/nalgebra/core/dimension/trait.DimMul.html) because i want to do a multiplication. Among the implementors i find

impl<D1: DimName, D2: DimName> DimMul for D1
where
D1::Value: MulD2::Value,
Prod<D1::Value, D2::Value>: NamedDim,
type Output = <Prod<D1::Value, D2::Value> as NamedDim>::Name;

Now i’m lost even more.
So it occurs to me that (generally) the following happens:

I try to use some lib (std or crate), then get a compiler error because of the wrong type. So i go ahead and consult the documentation for the function/trait/struct and see all kinds of different types, just to find out i don’t know anything about those as well. So i go look them up too because i really want to understand how all this works. And again, new types. Rinse and repeat. At some point i give up and just try a totally different approach to my problem because i think “you can’t possibly have to understand all of that, you’re definitely doing it wrong”. Rinse and repeat.

Maybe i simply do not have the mental ability to use Rust. And that’s ok too. I’m just the type of person that needs to understand every aspect of the code in order to create a mental model of it.

You can’t imagine how much this annoys me. I LOVE the borrow checker and lifetimes. And it especially annoys me that most people seem to struggle with the borrow checker in the beginning but not with the type system. So i imagine i’m missing something really important.


#8

Why does the second function need an iterator?

It doesn’t need it, it just lets us use it on more types.

// Ways you can call the old function
let v = vec![0.0, 1.0, 2.0]
double(v);

// Ways you can call the new function
double(v);
double(v.iter().cloned());
double((0..3).map(|x| x as f64));

Why do i need to call into_iter() on iter? Is it not of type iterator? I would think (from the first look) that it is because I seems to be of type Iterator created from a item of type u32?

This one boils down to a best practice. When writing a function that is generic over iterators, it is considered best practice to take an IntoIterator instead of an Iterator. Why?

  • Every Iterator is an IntoIterator. This is thanks to the following impl in the standard library:
    impl<I> IntoIterator for I 
    where
        I: Iterator, 
      type Item = <I as Iterator>::Item;
      type IntoIter = I;
    
    The type IntoIter is the output of into_iter(). Here, we can see that into_iter() on an Iterator returns the same type as the iterator itself (I). In other words, for iterators, it’s a no-op.
  • IntoIterator is implemented by more types:
    • IntoIterator is implemented on both iterators and containers.
    • Iterator is implemented on… well, just iterators.

Yep, I saw that too, and when I first saw it, I just basically blocked it out of my mind. If you ever see such a large impl, you should assume you do not have to understand it. If there is important information contained within something so monstrous, then it ought to be mentioned in plain english somewhere.

However, you are also in luck, since I was peeking around typenum yesterday after writing my example, because I was wondering how it was implemented and wanted to see if I might want to use it myself. As a result, I actually know what this is now: It is an implementation of Dim for every typenum unsigned integer of the form 0b1xxxxxxx, i.e. the numbers 128–255.

Basically, when I was looking around typenum, I saw that the numbers were defined like this:

// ...
type  U0 = UTerm;
type  U1 = UInt<UTerm, B1>;
type  U2 = UInt<UInt<UTerm, B1>, B0>;
type  U3 = UInt<UInt<UTerm, B1>, B1>;
type  U4 = UInt<UInt<UInt<UTerm, B1>, B0>, B0>;
type  U5 = UInt<UInt<UInt<UTerm, B1>, B0>, B1>;
type  U6 = UInt<UInt<UInt<UTerm, B1>, B1>, B0>;
type  U7 = UInt<UInt<UInt<UTerm, B1>, B1>, B1>;
type  U8 = UInt<UInt<UInt<UInt<UTerm, B1>, B0>, B0>, B0>;
type  U9 = UInt<UInt<UInt<UInt<UTerm, B1>, B0>, B0>, B1>;
type U10 = UInt<UInt<UInt<UInt<UTerm, B1>, B0>, B1>, B0>;
type U11 = UInt<UInt<UInt<UInt<UTerm, B1>, B0>, B1>, B1>;
// ...

What this shows is that typenum evidentally uses a binary encoding to represent the integers within the type system. UInt<U, B> is a construction that appends a bit B to an integer U. That monstrous impl above is generic over the values of the last 7 bits of an 8 bit number.

This is a recursive impl that implements Dim for N+1 bit-numbers, assuming that DimName is also implemented for N bits.

  • What’s DimName and how is it different from Dim? Well, since it has no useful documentation I can only assume it’s an “unfortunately public” implementation detail and that I should ignore it.
    • sometimes traits are broken up like this to resolve issues with coherence (the compiler forbidding two generic impls that look like they might overlap), or to prevent infinite recursion in trait bounds (the compiler failing to prove X: Trait when it depends on a proof that X: Trait).
  • Why is this impl here? I guess to support all numbers with 9+ bits.

I guess the best summary of these two impls is, “if you need a number with more than 7 bits, you can use an integer from typenum rather than the integers provided by nalgebra.” But looking around I don’t know if this fact is actually documented anywhere, which is unfortunate.


Addendum:

Those kind of signatures are a good example of what i mean with “deeply nested” too. So i go on and assume i do not need to understand that and have a look at trait nalgebra::core::dimension::DimMul (http://nalgebra.org/rustdoc/nalgebra/core/dimension/trait.DimMul.html) because i want to do a multiplication. Among the implementors i find

While I’m not certain, my best guess as to why nalgebra might have this trait is that it probably has a function somewhere that e.g. flattens an NxM matrix into a N*M length vector. This trait would then be needed in order to write the output type of that function.

These type-level programming constructs are seldom useful for users of a library; they only exist in order to make it possible for the author of the library to write the signatures of some of their functions!


I try to use some lib (std or crate), then get a compiler error because of the wrong type. So i go ahead and consult the documentation for the function/trait/struct and see all kinds of different types, just to find out i don’t know anything about those as well.

I’m going to be honest, this is how it begins for everyone. You’re entering a world that has had a fair amount of time to build up its abstractions and best practices, many of which are not entirely like most other languages, and now you need to play “catch up.” If a type error is just too overwhelming to figure out, you can share the problem snippet on #rust-beginners or #rust on IRC (irc.mozilla.org) and people can help you find out what to look for.


#9

@ExpHP:
Your explanations make perfectly sense. Thank you very much!
I wish i could consult you every time i can’t make sense of some documentation :smile: . As mentioned, i too assumed that i do not need to understand everything when consulting the docs. I hope as time goes by the line between “i don’t need to understand this right now for my use case” and “it would be better to check out what that does” becomes more clear.

It seems i have a lot to learn, and it seems, i will have to get more in touch with the community if i want to learn Rust.
As for now, thanks to everybody taking the time to support this beginner. I will now go back to learning with ExpHP’s recommendations in mind.

Have a nice evening.
Saša


#10

@ExpHP gave an excellent explanation (the projects themselves could potentially use pieces of it in their own docs).

I’ll also just mention that nalgebra and typenum are some of the more type heavy libs that I’ve seen - this is by no means the norm. Libs use generics liberally but type level calculations are a rare thing. They also suffer from lack of language features, notably type level integers, which makes them implement the same stuff for a fixed number of types of different arity. It’s more boilerplate for the author of the lib, yes, but it also makes the output harder to read in docs because all of them surface there (as @ExpHP said, you learn to ignore them for the most part).

The issue with these crates is that even though the authors intend the users to write code a certain way, and if the code compiles you don’t see the monsters beneath the surface, the compiler errors when things do go wrong are not for rust beginners.

I’ll mention a nice blog post that walks you through type level computation: https://beachape.com/blog/2017/03/12/gentle-intro-to-type-level-recursion-in-Rust-from-zero-to-frunk-hlist-sculpting/. You might find it interesting and if you understand the technique there, it’ll help understand other libs using it.