How do tuples and arrays work compared to Lua's tables?

Heyo!

I'm a beginner programmer, and am learning Rust. I originally started with Lua, and was able to get a good grasp on tables before deciding to try out Rust

Some Explination:

In Chapter 3.2 (Data Types), they explain Tuples and Arrays a little.

When I was learning how tables worked, I found two posts [1] [2] on Stack Overflow that were really interesting and informative about how it works.

To summarize them:

In Lua, everything is by default assigned as "null", and so doesn't exist. As soon as you assign it to something else, it exists as that value.

When you declare a table, it is essentially a sea of infinite nulls. As soon as you define something in that table, the key for it (by default it's in the array part of the table and given an index value as the key, or if you assign a key it's put in the hash part of the table) is assigned the value instead of a null. So there is no way to delete or remove a variable- only reassigning it to a null.

I realized that my understanding of tables will probably influence how I grasp tuples and arrays. So if there's any differences it will probably cause me confusion, and I would rather ask sooner than later and stop that if I can.

My main question:

From the understanding of how Lua tables work, what would be a good way to think about how tuples and arrays work in Rust?

There isn't really a type in Rust that maps directly to Lua tables. HashMap is close, but it's homogenous over the key and value types that it stores. In other words, you cannot have a HashMap that has both string and integer keys, or whose values are strings, integers, and structs of various shapes. Enums let you cheat a little bit if you really need some kind of duck typing, but they do make it a little harder to work with.

Like a Lua table, getting a key that doesn't exist in a HashMap returns None. So, in some sense an empty HashMap is "a sea of infinite Nones".

The book pages you linked to on tuples and arrays are already pretty good and provide better explanation than I could.

Nothing. Rust arrays and tuples have very little in common with Lua tables.

In Lua, complex data structures are implicitly heap allocated and implicitly follow reference semantics. Tables are no different. In addition, they are arbitrary key-value mappings (since Lua is dynamically-typed) and can grow and shrink.

In contrast, Rust arrays are value types, they don't implicitly heap allocate, and have a fixed element type and size. Tuples are almost the same, except that they can have fields of different types (and so their fields can't be treated all the same like array items, so they aren't, for example, iterable).

1 Like

I would say that answer is directly related to what we discussed in the other thread.

Lua tables are jack of all trades, master of none.

They combine properties of half-dozen different data structures in Rust: tuple, struct, HashMap, Arc and a bit of enum, too. Probably a few more.

Nothing is declared upfront and you can add almost anything to any table. It's very flexible approach but not very safe.

Thus I would recommend to imagine Rust data structures as as Lua-Tables-optimized-for-a-certain-use-case.

You tell the compiler what you want and compiler keeps track of that.

This is a rectangle and it has height and width? Noted, I would tell you if you would try to put depth there, too.

This is an array and it has 10 elements? Noted, I would tell you if you try to access 11th one,

And so on. Basically amalgamation of everything in one uber-datastructure (typical for dynamically-typed languges) is split into many different datastructures for different needs.

In particular one of the most important achievements of Rust is precisely the opposite of what Lua does: instead of infinite sea if NULLs there are none at all (in unsfe Rust NULL does exist, but that's separate story)!

3 Likes

Interesting, that makes sense!

Hmmm, I've kinda heard about that, but how does that work? How can you not have nulls?

I think that may be a pun: Rust's equivalent (somewhat loosely) of a null value is None. Rather than allowing any variable or field to be null or undefined as in JavaScript, or nil as in Lua, in Rust one can control which variables and fields can be None, and the compiler will, in general, point out places where one forgets to handle the possibility of one of those variables being None, unlike in Lua where one wouldn't get a "attempt to index a nil value" error message until one actually hits that case, which might happen only rarely and create a difficult-to-hunt bug.

Controlling which variables can be None is a special case of how Rust uses a rich type system (for a "curly brace language") to let programmers communicate more of their intent to the compiler, as mentioned in topic 85384:

(I'm writing in general, non-technical terms of "Controlling which variables can be None" because I assume that the Book and other beginner-oriented documentation is better at teaching the mechanics of how to exercise that control than I would be.)

2 Likes

Also note that there is no substantial difference between how Lua tables and Rust hash maps handle missing keys. I actually find the "infinite sea of nils" a really bad analogy. It's actively misleading, for one.

An empty table/map is empty, period. There is no infinity involved whatsoever. It's simply that if a key is not found in a table, you will get nil (in Lua) or None (in Rust) for its value, indicating that there is no corresponding value. Hash tables only ever store the keys they do contain (ignoring various deletion and shrinkage strategies), and as such, their size in memory is a constant times the number of keys. (The reciprocal of this constant is called the load factor, you can google it if you are interested in the details — it's good knowledge understanding how a hash table works.)

I don't understand why the resource you were reading tried to mix all this up with "infinite", but it really does not register. You don't need any sort of spooky analogy here, because it's a simple rule. Both languages have a concept of a missing value (and Rust can express this as a type, too), and so when you ask for a value that isn't, then you get this placeholder. Simple as that.

2 Likes

Some differences:

  • In Lua, an empty table allocates memory on the heap (try print{} in Lua). In Rust, an empty Vec or empty HashMap do not allocate memory on the heap (until you push elements into it). This is particularly relevant in Lua because it imposes work on the garbage collection mechanism there.
  • Tuples and arrays in Rust have a fixed number of elements once created, while tables in Lua may always grow. (Rust requires a Vec or HashMap if you want to change the number of elements later.)
  • When Lua tables contain holes (i.e. intermediate nil values for positive integer keys smaller than the maximum used integer key), then certain operations don't work properly anymore; in particular the length operator # in Lua becomes unspecified. In Rust, you may not arbitrarily set any value to None unless you make the contained type Option<T> instead of T. In that case, None is an ordinary value which doesn't cause the Vec, HashMap, array, or tuple to behave any different.
  • Lua's tables serve many purposes:
    • named arguments (what would often be a builder pattern in Rust),
    • growable sorted lists of elements of the same type (what would be a Vec<T> in Rust, where T is the type of each element),
    • growable sorted lists of elements of different types (what could be a Vec<dyn Trt> in Rust, where Trt is a trait each element's type must implement)
    • fixed number of elements of the same type (what would be an array in Rust, though sometimes you would deliberately use a Vec instead),
    • fixed number of elements of different but fixed types (what would be a tuple in Rust),
    • mapping strings to values (what would be a HashMap<String, _>, HashMap<&str, _>, or HashMap<Cow<'_, str>, _> in Rust),
    • mapping tables to values (for which I cannot imagine a Rust equivalent),
    • …
  • …

Note that you can emulate the behavior of retrieving infinite nils (as in Lua) also in Rust:

fn main() {
    let vec = vec![Some(15), None, Some(30)];
    assert_eq!(vec.get(0).cloned().flatten(), Some(15));
    assert_eq!(vec.get(1).cloned().flatten(), None);
    assert_eq!(vec.get(2).cloned().flatten(), Some(30));
    assert_eq!(vec.get(3).cloned().flatten(), None);
    assert_eq!(vec.get(4).cloned().flatten(), None);
    assert_eq!(vec.get(5).cloned().flatten(), None);
}

(Playground)

1 Like

Another important difference between Lua's tables and Rust's Vec, for example:

A Lua table has an identity that can be compared:

a = {}
b = {}
c = a
assert(a ~= b)
assert(a == c)

Rust has no such identities. When you compare two Vecs (or arrays, or tuples), this compares the contents:

fn main() {
    let vec1: Vec<i32> = vec![1, 2, 3];
    let vec2: Vec<i32> = vec![1, 2, 3];
    assert_eq!(vec1, vec2); // but writing into `vec1` doesn't change `vec2`!
    let tuple1 = ("Hello", "World");
    let tuple2 = ("Hello", "World");
    assert_eq!(tuple1, tuple2);
}

(Playground)

Compare with Lua:

> a = {1, 2, 3}
> b = {1, 2, 3}
> a == b
false
> a = {"Hello", "World"}
> b = {"Hello", "World"}
> a == b
false

Sometimes you can use a raw pointer to the allocated memory (e.g. using Vec::as_ptr) to ensure identity, but in case of an empty Vec, this produces unpredictable results when using it for comparison (because not all Vecs allocate):

fn main() {
    let vec1a: Vec<i32> = vec![1, 2, 3];
    let vec2a: Vec<i32> = vec![1, 2, 3];
    assert_eq!(vec1a, vec2a);
    assert_ne!(vec1a.as_ptr(), vec2a.as_ptr()); //  works
    let vec1b: Vec<i32> = vec![];
    let vec2b: Vec<i32> = vec![];
    assert_eq!(vec1b, vec2b);
    assert_ne!(vec1b.as_ptr(), vec2b.as_ptr()); // may fail!
}

(Playground)

Thus, I think the usual approach in Rust when identity comparisons are required (i.e. checking if two values are not just equal in regards to their current content but the same) is to use a unique integer being part of the value. In Lua, however, each table has a unique identity which can be compared with other tables (as long as no metatable overrides the behavior of the == operator in Lua, in which case you still can use rawequal function to compare table identity).


The macros assert_eq! and assert_ne! in my examples ensure that the arguments are equal or not equal, respectively, and throw an error otherwise.


P.S.: Sorry if my posts were a bit too complex / into details. I think the most important thing to keep in mind in regard to Rust is that Rust's arrays are fixed in size. If you need a growable sorted list of values, you use a Vec. If you need a mapping from values of a certain type A to a value of another type B, then you use a HashMap<A, B>.

That doesn't in fact fail. A non-allocating vector always returns NotNull::<T>::dangling(). I don't know if this is guaranteed, but given that the documentation of .as_ptr() says:

Returns a raw pointer to the vector’s buffer, or a dangling raw pointer valid for zero sized reads if the vector didn’t allocate.

it would not really be feasible to change this in a backward-compatible manner.

It does:

thread 'main' panicked at 'assertion failed: `(left != right)`
  left: `0x4`,
 right: `0x4`', src/main.rs:9:5

You mean it could be guaranteed that it fails?

Oh, wait, you are writing assert_ne!(), not assert_eq!(). Yes, my point is that since the raw pointer is publicly exposed and documented to be the valid dangling address, it should probably be considered breaking to change the pointer value. (My opinion is in on way authoritative, but I would feel very upset if this weren't guaranteed.)


To the point: I believe a better check of identity would be to compare the vectors' addresses, not those of the buffers, i.e., !core::ptr::eq(&v1, &v2). I don't know if that's guaranteed to be true even in the presence of optimizations, though, with the observation that Vec<_> is not a ZST.

It says a dangling raw pointer, not the dangling raw pointer. There exists a valid dangling pointer value (where ZSTs can be read from or written to) for every non-null well-aligned address, which can be explicitly created with ptr::invalid(). Dangling pointers must be distinguished from pointers to deallocated memory, where even ZSTs cannot be accessed.

Note that NonNull::dangling() just says "Creates a new NonNull that is dangling, but well-aligned."; it doesn't say that it has a value equal to the alignment of the type, nor even that repeated invocations will produce the same value. If we gave that guarantee, we'd say it explicitly. In fact, it explicitly warns that its address could compare equal to the address of an actual value.

So if I understand correctly, you can't really return something that's None, or at least like you can with null / nill in other languages?

You can not return None if you want to return reference. But you can wrap your reference into Option and then it would becomes possible to return None.

But then you can not just dereference it, you have to first check whether it's None or Some!

This sounds like sophisticated and pointless exercise but believe me, forgetting to pass object when you need it because you can pass null everywhere and compiler wouldn't stop you is one of the most common bugs in most languages out there.

1 Like

If you need to, you return an Option.

The idea is to make that possibility explicit, and to make you handle it.

1 Like

Oh, I'm really glad I was able to get to Chapter 6 today, I kept seeing Option<T> mentioned here and there but I couldn't really understand what it was since I didn't understand enums. Fortunately they have an entire section on it in Chapter 6.

Thanks guys, you're being really helpful so far in my attempt to piece all this together.

2 Likes

In Lua, all the syntax for accessing structured data - in any structure - is at some level an alias for a single "get" operation, which takes a table and a value (the key) and either returns the value most recently stored in that table for that key, or nil, if no value has been stored in the table for that key. All the syntax for building structures alias for a single "put" operation, which takes a table, a key, and a second value (the data), and modifies the table so that subsequent calls to get for that table and that key will return that data.

You can access the raw operation, passing any value you like as the key or data, using Lua's table syntax. "Get" is spelled tab[key], and "put" is spelled tab[key] = value. Syntax like tab.username is translated to tab["username"] for you, which means you can use syntax that's more convenient and conventional, but the semantics are governed by the underlying operation.

In Rust, with structured data, there's no generic get or put operation. Instead, each structure defines a family of operations - two per field, one for setting only and exactly that field and one for retrieving the most recent value. The collection of these operations that are valid in any given program is defined by the declaration of the type.

For example, given

struct Signup {
  username: String,
  email: String,
}

then there is an operation that takes a Signup value and returns the username, and an operation that takes a Signup value and a String and modifies the Signup so that future calls to the username-retrieving function return the new value. Ditto for the email field - there's a get operation specific to that field, and a set operation. They're spelled signup.username, signup.username = value, signup.email, and signup.email = value.

This is a surprisingly fundamental difference. For example, it implies that code that tries to access a field which is not declared simply won't compile - the operation it's trying to invoke does not exist. In Lua, the operation does exist, and will set the field you asked for, even if other code operating on that value has no idea that field could have useful information in it. It also means that there's little need in Rust for functions like Lua's keys(tab), since there's no way for code to generically access fields by key in the first place.

Rust's approach is a common approach shared by a lot of compiled and statically-typed languages. It requires noticably more work from the programmer in a lot of cases, because every type your program operates on needs to be specified somewhere, whereas in Lua a structure can be defined by use in an ad-hoc way. On the other hand, Rust can tell you if you make a spelling mistake or try to assign to a field that isn't expected to exist; Lua can't, because it has no mechanism to distinguish that from any other get or put to a field in a table. Rust can also tell you if you accidentally try to assign a number or an array to a Signup's string-valued username field; Lua can't, because there's no specification to tell it that that is a mistake in the first place.

Tuples are equivalent to structs, save that the fields are named .0, .1, and so on instead of .username and .email. I mean that literally - the two categories of type have identical semantics, and differ only in syntax. Tuples are useful when it's not meaningful to give each element of your structure a name, but otherwise they're interchangeable.

Arrays (and slices) are a little more complicated. Unlike a struct or a tuple, an array can only store values of a single type - but the accessor for that data is an operation that takes an array and a usize (the index), and returns a value, while the modifier is an operation that takes an array, an index, and an new value, and modifies the array at that index. The syntax for this will be familiar: arr[idx] retrieves a value, and arr[idx] = val sets it. They're more like Lua tables than structs are, because do allow modifying values using a key determined at runtime, but they're not identical. For example, accessing or storing a value at an index that's greater than or equal to an array's size will immediately panic your program, whereas Lua will treat it like any other table key and store/retrieve the value. And, unlike Lua, the only valid type for the index is usize, not any arbitrary value you happen to have on hand.

I think the thread got a little astray on why this matters in the process of discussing the data structure most like a Lua table (HashMap and friends). The differences you're asking about, and the ones that make the most practical impact on programming in each language, are less about how you might translate table code to Rust, and more about how the difference in the design of the languages' respective data structure systems impacts programmer experience.

2 Likes

You are free to return None like you would return any other value. None (and Option in general) is not special.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.