How to return static str?

Hi guys,

The most complicated topic of all beginner rusters. Strings. Can someone give some hint how can I return a static str? I thought I understood that topic but now I am struggling again :exploding_head:

The following code gives me an error:

use std::str;

pub fn get_str() -> &'static str {
    let result: [u8; 3] = [97u8, 98u8, 99u8];
    return &str::from_utf8(&result).unwrap()[..]; // error[E0515]: cannot return value referencing local variable `result`
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn get_str() {
        assert_eq!(get_str(), "abc");
    }
}

By the way I am not sure why I need &'static str. Can I just return str instead? And why everything is so complicated with those &str? IMHO why cannot we just have String and str like everything else has? String is heap, str[size] is stack???

It's not exactly complicated. The rule is simple and straightforward: you can only borrow something for at most as long as it lives. (Otherwise you would have a dangling pointer.)

Soi if you want a &'static str, you have to create it from something that lives for the 'static lifetime. The only reasonable way (apart from allocating and leaking heap memory, which I wouldn't recommend at all) for achieving that is to start with something that is also 'static to begin with:

pub fn get_str() -> &'static str {
    static RESULT: [u8; 3] = [97u8, 98u8, 99u8];
    str::from_utf8(&RESULT).unwrap()
}

I don't know why you would this, though, instead of just writing a string literal (which is automatically &'static str).

Does it mean that string will live in the memory forever? If yes, is there any other way to return str (known size, immutable). I guess I can use Arrays, but it so ugly. Every language has a string type.

Static strings are usually references into immutable global variables, and those do indeed live forever. (String constants in the source code compile down to such a global.)

If the string isn't a compile-time constant, you can use the String type instead.

3 Likes

I know, but it is heap allocated, so that is why I went with static str approach.

and those do indeed live forever

I cannot do that. That function is going to generate id for database record. There could be thousands of millions during the entire lifetime. Keeping them in memory is a disaster.

It is sad, Rust doesn't have a simple stack allocated string. :frowning_face:

How about the smartstring crate? It provides a string that is only heap allocated if the string is long.

Other than that, you could use the [u8; N] type, or wrap it together with a length if you want a max length and never heap allocate.

3 Likes

All string literals are embedded in your program's binary, which the OS will map into your program's address space on startup.

However, string literals can only be created at compile time. If the characters of the string are only going to be valuable at runtime (e.g. when they are read from a database), then you need to either use a heap-allocated String or something which implements the small-string optimisation.

That said... if you have thousands of millions of strings in memory at any one time that sounds like an architecture issue. Normally, you'll either free strings once you are done with them (i.e. by letting the object holding them go out of scope) or you'll design your algorithms so they can process data chunk-by-chunk without needing to keep the world in memory.

10 Likes

Thanks. I am probably going with the second solution as I was thinking from the beginning.
The first solution is interesting one, but it smartness is not what I am looking for. I simply looking for a sized immutable string.

No I mean that function is going to generate new Ids (which won't be destroyed because of static). I am not going to pull millions rows from the db (:-))

If that's the case, you most probably want to use a String.

At the end of the day, the memory for that string must live somewhere. Trying to make &'static str work would arguably be worse for your program's memory consumption because any ID you ever create would stick around for the lifetime of your program (a memory leak!). On the other hand, a String will be automatically cleaned up when it is dropped, allowing its heap memory to be reused for another ID later on.

9 Likes

So the final code as follows identity_helper.rs:

use uuid::Uuid;

const IDENTITY_DICTIONARY: [u8; 36] = [
    b'0', b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8', b'9',
    b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
    b'k', b'l', b'm', b'n', b'o', b'p', b'q', b'r', b's', b't',
    b'u', b'v', b'w', b'x', b'y', b'z'
];

pub fn identity() -> [u8; 16] {
    let guid = Uuid::new_v4();
    let rnd = guid.as_bytes();
    let mut result: [u8; 16] = [0; 16];
    for i in 0..16 {
        let rnd_idx = rnd[i] % IDENTITY_DICTIONARY.len() as u8;
        result[i] = IDENTITY_DICTIONARY[rnd_idx as usize];
    }
    return result;
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn t01_length_is_ok() {
        assert_eq!(std::str::from_utf8(&identity()).unwrap().len(), 16);
    }

    #[test]
    fn t02_is_unique() {
        let mut keys = std::collections::HashSet::new();
        for i in 0..100000 {
            keys.insert(identity());
        }
        assert_eq!(keys.len(), 100000);
    }
}

I was thinking to use char instead of u8 but realized it has 32 bits. [char; N] could be useful for unicode strings (as a replacement of str). I cannot see str can be used in any other situations apart of declaring them as const or static :frowning_face:.

Thanks guys for any help. :+1: :+1: :+1: I wish rust could be better around strings, especially in high level programming like web servers or applications. Playing with arrays of bytes makes code not readable and there are lots of conversions when it comes to show those "strings" to users.

There is ArrayString from the arrayvec crate that may do exactly what you want.

5 Likes

Thanks, but I cannot see it supports utf/unicode. If I consider something like that I would use it everywhere in the app (including user interaction data). So in byte array example I can use ether [u8], [u16] or [u32] when I need.

It supports unicode.

2 Likes

I don't think so Rust Playground

The capacity is in terms of UTF-8 bytes: same as your [u8; 16] if you're using UTF-8.

2 Likes

In my understanding (and code readability) it should be [u16, 10] based on your example (link). If it's true, my example should work.

If you want something whose length is measured in code points as opposed to in bytes, then you would need [char; 10]. An u16 is too small as there are code points larger than an u16.

3 Likes

All str functions work on UTF-8 encoding, not UTF-16. Rust standard library is basically following the "UTF-8 everywhere" philosophy.

Note that this verifies that the number of UTF-8 code units of identity is 16, not that the number of code points is 16. str::len() counts UTF-8 code units.

If you want a user-friendly count of characters, then counting code points is also not what you want. For example, "Café" has 5 code points, and 6 UTF-8 code units.

3 Likes

you are aware that most other languages also don't, right? Java, C#, javascript, and the like use heap allocation for every type, Rust gives you the choice to heap allocate something. if you want a str on the stack I'd suggest you use a byte array and make conversions as needed. You could also implement a wrapper for that or check if any crates already solve that issue.

Now, unless you plan on running your program outside an OS, you should just use the std heap allocated String, if it's good enough for the rest of us it's good enough for you most likely.

8 Likes