Noob question again: why a vec?

I am working through the Rust book, and I think I understand what a vector is, basically (a kind of array, but it can grow and shrink). But I am not really clear what the use case is. Would I use it in the same way I would use a linked list in C? Or can I do other clever things with a vec I don’t see yet?

4 Likes

Vec<T> is the default answer for storing multiple items of the same type.

It's like List<T> in C#, like ArrayList<T> in Java, like std::vector<T> in C++, like malloc/free in C (but less annoying), like new[]/delete[] in C++-but-not-willingly, etc.

(Arrays are only a good fit for fundamentally-limited things, like if you store RGBA colour in [u8; 4]. If it's not incredibly obvious that a small array is the right choice, you should default to using a vector until you have a good reason to do otherwise.)

11 Likes

If it needs to grow or if you don't know the size at compile time, which in my experience is almost always the case. Array has a very limited usage like @scottmcm pointed out.

5 Likes

Vector stores data in contiguous blocks, which means you can get to any index at constant time, whereas in pure linked lists getting to item N takes N hops over pointers. (I guess some implementations may have an index to make things quicker.)

3 Likes

Why does everything have to be "clever"? Vec is just a contiguous, dynamic array. That's where you want to store your data most of the time.

14 Likes

I find this quite an amazing question. You see I started out programming in BASIC and then assembler then ALGOL. As such my world was all arrays. When I got to C and first heard of Linked Lists I thought "That's neat, what does one do with it?". Quite the opposite of your question. As it happens in decades of programming I don't recall ever using a linked list. Although I have of course dealt with trees and graphs as data structures for various problems. Which are generalisations of the linked list.

Thing is a Vec stores it's elements in consecutive memory locations and they can be accessed with a simple index in constant time. Where as a linked list generally has its nodes splattered all around memory, wherever they happen to be allocated. Accessing the Nth item of a linked list requires following N links so the access time depends on the length of the list. "O(n)" as they like to say. This is slow for large collections.

Also as our processors today rely on cache memory to get any performance traversing an array will be much faster as many elements can fit in our cache whereas with a linked list elements can be anywhere in memory and require cache refilling all the time.

As for doing "clever things" in a Vector I would not dream of performing a Fast Fourier Transform with the data samples in a linked list. It's performance would be miserable. Same applies to many, many other far simpler mathematical operations.

17 Likes

Nothing has to be clever. But I would like to understand cleverness I didn’t see yet. Why would that be wrong?

Because its a predisposition, putting you in a position where obtaining the knowledge seems undoable. Probably yet another manifestation of the impostor syndrome.

And thanks again. I hope I don’t annoy too many people too much with my questions. Your answers help me to learn more and faster than just reading books and trying things.

1 Like

I don’t know about that. I neither feel it is undoable to learn nor that I am an imposter. But I am not anything close to a professional programmer, for me it is 100% hobby, fun. I may not see some things, and I have no problem asking. My only worry is that my questions may be annoying to professionals, but I have no problem with the fact (!) that I am better, much better, at other things. That does by no means imply I think it is too hard.

1 Like

Let's forget about "clever" for a moment. What about something really simple...

Let's say you have a bunch of numbers that you want to store somewhere, and let's say you need to add them up to get a total. If you stored them in a linked list you could get the total with:

    let total: u32 = list.iter().sum();

If you stored them in a Vector you could get the total with:

   let total: u32 = vector.iter().sum();

Looks pretty much the same right?

However the performance of these two is going to be very different. And get more and more different as the size of your data increases. For the reasons I mentioned above.

Bottom line is that it is very rarely that a linked list is very rarely the solution to ones problems.

Do not worry about asking silly questions. All questions are trivial to somebody. Many people here are not professionals or at least not using Rust professionally. We all had to start from somewhere. Well, nowhere, I guess.

3 Likes

Oh, I'm not saying wanting to learn is wrong. I was just surprised by the way you asked the question, as if learning non-clever things were useless.


To the point: Vec is really nothing clever. It's basically the most fundamental data structure there can be. There are no (pleasant or unpleasant) surprises; it's the textbook implementation of a dynamic array with amortized O(1) appending to the end.

3 Likes

Ah, but that is the clever part!

Naively if one wanted to store another element in an array that is already full one might so the obvious thing:

  1. Allocate a new array that is 1 bigger than the existing one.
  2. Copy the content of the old array into the new array.
  3. Deallocate the memory of the old array.
  4. Put the new element at the end of the new array.

One might then observe that the performance of this is terrible and start to think the linked list would have been better. But then one might have an idea that growing the array by more than one at a time may be better. Say 8 new elements or 16 or whatever.

The really smart will apply some mathematics to the problem and conclude that doubling the size of the array every time it fills up is the optimal solution, offering linear time when amortised over growing the array to infinite size.

One day reality bites and oops, I don't have enough space for my data set even though my machine has nearly twice the size in free memory. Damn Vector....

Nothing is simple.

7 Likes

I wouldn't say “nothin”. Vec have push with amortized complexity O(1) and that fact, alone, is enough to reject about half of the candidates during interview.

It touches simultaneously many things: what is complexity, what is amortized complexity, how they are different, and, finally, what is the actual complexity (and possible implementation) of Vec::push.

For you these may be “elementary” and “nothing clever”, but you would be amazed to know how many candidates have no idea about these!

3 Likes

Maybe I formulated wrong. English is not my native language :slightly_smiling_face:

1 Like

Exactly. I was probably programming for a couple of decades, on all kind of projects from tiny embedded systems to desktop CAD systems before I ever heard of "amortized complexity O(1)" Or O(anything) really. That kind of algorithmic analysis was the stuff of my CS major friends not a simple physics grad getting stuff done with software.

Most of the time had I known about it I would have rejected it anyway. I needed to know exactly how much memory my creations were using at every moment, not trust it some funky algorithm grabbing what it wants when it wants.

1 Like

Well… such an attitude isn't an option for modern world. For two reasons:

  1. Life is to short to optimize constants and put then in privileged place on a drum memory
  2. Modern compilers are full of these funky algorithms anyway. You can work with them or against them. Latter approach leads to endless frustration.

Even microcontrollers novadays are large enough to both afford “funky algorithms” and need them to finish tasks in any reasonable time.

Also: most newbies these days don't have trouble trusting funky algorithms. Rather the opposite: they trust them to do “magic”… and they get what they desire: instead of some tech that works incredibly reliably (modern transistors are, probably, the most reliable thing humanity ever made) they get “magic” which works or doesn't work depending on the phase of the moon.

Oh good, I will have lots to disagree with then :slight_smile: ...

Wait a minute... When we reorganise our data so that our programs use our processors cache memory more efficiently for performance reasons we are doing exactly what Mel did in arranging instructions around a drum memory. Different technology same problem. Many people spend what they have of their lives doing that.

Do they? Perhaps the likes of Javascript and Python do. What with their memory management and garbage collection of everything. As far as I know the likes of Rust, C, C++ etc do not. Admittedly Rust may well have all kind of funky magic in its std library but that is the library not the language. I can use them or not.

That is increasingly true I guess. Still not true of any microcontrollers I have around here. What do you mean by "reasonable time" though? Even on the nicest of PC now a days you are still going to be worrying about space and time for many things, processing audio, creating video games, ensuring the user has a slick response.

Don't get me wrong. I'm hardly a newbie but I'm happy to relax in the luxury of funky algorithms as much as possible.

1 Like

Yes. But what about that:

Mel never wrote time-delay loops, either, even when the balky Flexowriter required a delay between output characters to work right. He just located instructions on the drum so each successive one was just past the read head when it was needed; the drum had to execute another complete revolution to find the next instruction. He coined an unforgettable term for this procedure. Although "optimum" is an absolute term, like "unique", it became common verbal practice to make it relative: "not quite optimum" or "less optimum" or "not very optimum". Mel called the maximum time-delay locations the "most pessimum".

Do you know anyone who does something like that? That was common even in the end of last century, but I don't think anyone does that, anymore.

Oh yes, they do. When you are writing code in C or Rust novdays you are writing for the virtual machine, not for the “real hardware”. Attempts to hide something from the compiler and say “I know virtual machine doesn't support something but real hardware works that way thus I can do something” inevitably leads to troubles. Very often hard to detect and debug troubles.

They do that a lot. Except people like to say that compiler “understands them” when it removes the redundancy or “doesn't understand them” when it breaks their oh-so-clever code. Compiler is not capable of doing any understanding and thus couldn't misunderstand anything either. It just applies one simple rule thousands of times while transforming code from something defined as virtual machine to something defined as machine code for real CPU.

Time allotted for the creation of firmware for something. No matter whether it's car or a plane no one would give a programmer a few months to implement that infamous “cheat switch” in a blackjack. And no one would accept funky story about how someone's “subconscious is uncontrollably ethical”. S/he would bring the task to you and would say that you have a week, but if you can send change tomorrow it would be great. And if it doesn't work… well, too bad, maybe it's time to hire somehow who would make it work.

The important thing is to remember that you can not have your cake and eat it, too: either you allow “funky algorithm” to do it's thing xor you exclude it and do things manually.

Trying to cheat and do things behind “funky algorithms” back leads to tears and the more advanced these “funky algorithms” become the harder it becomes to predict how they would react to cheating.

I cannot fault Mel for doing that. If the program has nothing else to do while waiting then Mel's technique saves a few instruction by not having write a busy loop. On machine that had preciously small space for code.

Yes we still do that kind of thing. Waiting for a timer interrupt for example. Which effectively halts the processor just like Mel waiting for the drum to go round.

Or we write a "busy loop", not optimal performance for the over all multi-tasking system perhaps, but sometimes necessary: https://github.com/torvalds/linux/blob/master/arch/alpha/lib/udelay.c

I am well aware of the virtual machine idea of C and Rust. I agree that try to "cheat" the optimiser is a bad idea. Generally it is UB. That is not what I was getting at or suggesting. Rather I see calling library functions as being very different from just running code compiled from the source I wrote.

I can well understand the pressure to get the job done quickly. I have suffered enough of that.

But you were talking about "Even microcontrollers novadays are large enough to both afford “funky algorithms” and need them to finish tasks in any reasonable time."

So what is "reasonable time"? Having worked on many control systems including such things as the Boeing 777 Primary Flight Computers I can report that "reasonable time" is often a "hard real-time" requirement. Often in the 10's to 100's of milliseconds. Sometimes down to microseconds. If the code cannot produce the required result before that deadline it is a failure as sure as dividing by zero. In that world relying on "amortised O(1)" run time is not acceptable. Funky algorithms like Vec are out.

So cars or planes or whatever the boss may want quick development, he also wants stuff that actually works.

Totally agree.