Concatenate two `&'static str`

I am trying various ways to concatenate the two string slices t1 and t2 below:

#[derive(Debug)]
struct Thing {
    value:f64,
}
impl Thing {
    fn from_str(i:&'static str) -> Thing {
        Thing {
            value:i.parse::<f64>().unwrap(),
        }
    }
}
fn main(){
    let t1 = "1223";
    let t2 = "554";
    let  t = t1.to_string()+t2;
    let thing = Thing::from_str(t.as_str());
    println!("{:?}", thing);
}

Givs error:

error[E0597]: `t` does not live long enough
  --> src/main.rs:16:33
   |
16 |     let thing = Thing::from_str(t.as_str());
   |                                 ^---------
   |                                 |
   |                                 borrowed value does not live long enough
   |                                 argument requires that `t` is borrowed for `'static`
17 |     println!("{:?}", thing);
18 | }
   | - `t` dropped here while still borrowed

If I change main to:

fn main(){
    let t1 = "1223";
    let thing = Thing::from_str(t1);
    println!("{:?}", thing);
}

There is no problem

You don't need the 'static lifetime on the argument of Thing::from_str(), since str::parse() works just fine with any (temporary) lifetime (as long as the parsed representation itself doesn't refer to the input string).

And the actual error is because t is a String in a local variable inside main, so it's definitely not 'static, and therefore the pointer to its buffer, which as_str() effectively returns, can't be, either.

Yes I do. Not here but in the programme this is from.

Can you share some more context?

There is the concat! macro, but its arguments must be string literals, not variables:

let t = concat!("1223", "554");

If you want to use the same constant strings in multiple places, you could write macros that expand to them:

macro_rules! t1 { () => { "1223" } }
macro_rules! t2 { () => { "554" } }

let t = concat!(t1!(), t2!());

Or you can use the const-concat crate: GitHub - Vurich/const-concat: Heinous hackery to concatenate constant strings.

In general though if you want a string without a limited lifetime, it might be better to use String rather than &'static str.

2 Likes

This will probably work.
I am surprised this is not a language construct allowing...

let t1:&'static str = "abc";
let t2:&'static str = "123";
let t3:&'static str = t1 + t2l

...would be very handy! I do this a lot. Long static strings with a matching comment string. Need to break them into pieces so the comments can match the string.

Here is a example of building a tree from a string with lables for the nodes in the comments

        let t1:&'static str =
        //    0   1    2  3   4     5  6  7   8   9   10 11 12      13  14      15       
            "Lt( Prod( A 0.5 Prod( 0.1 A Neg( Lt( If( Lt( A 0.8 )Lt 3.0 5.0 )If 33.9 ";
       
        let t2:&'static str =
        //             16  17              18   19  20  21             22
            ")Lt  )Neg 0.5 12.0 )Prod Sum( 10.0 2.0 0.5 1.5 )Sum )Prod 12 )Lt";

In Rust you can create you own language constructs with macros, so for instance you can get the following code working:

fn main ()
{
    static_strings! {
        t1 = "1223";
        t2 = "554";
        t = t1!(), t2!();
    }
    let _: &'static str = t;
    println!("{}", t);
}

And with a little more effort one could even make a macro that would accept:

fn main ()
{
    static_strings! {
        t1 = "1223";
        t2 = "554";
        t = t1 + t2;
    }
    let _: &'static str = t;
    println!("{}", t);
}
2 Likes

Const evaluation will eventually allow this much more easily. However, it's not just about this, but arbitrary computations at compile time, and it is appropriately complex.

As far as language features go, this can be used for some small optimizations, but in the majority of cases it can always be replaced with code at runtime (like your x.to_string() + y).

Once we have real const evaluation, this could be as simple as static COMBINED: String = t1.to_string() + t2; and passing in &COMBINED. That's a long way away now, though.


Along those lines, your code needs &'static str, and there's any chance you'll need to compute that string, I'd recommend using a Cow<'static, str> instead. It's an enum which is either &'static str or String - so it can store static strings with very low overhead, as well as store owned strings. Functions can take impl Into<Cow<'static, str>> to accept both &'static str and String, and they just need to call .into() to convert them.

Hopefully your code can be converted to use Cow<'static, str> instead, and to thus be more flexible?

2 Likes

To make &’static str without leaking memory, the only option is to reserve space on the compiled executable binary, put the text data on it, and make the references to point to that section. Rustc already do so for the string literals (and concat! macro which evaluated far before the type check phase), but obviously this can't happen in runtime.

You may ask than why other languages allow this. To be specific, it's the GC that allows it. I mentioned earlier that it's not practical without leaking memory. But in GC-ed languages nobody in code has responsibility to free the memory - you just constantly keep leaking memory, and our mighty GC will eventually free those garbages.

2 Likes

Yes. A expression like let ti = "abc" + "123" is all compile time and the result "abc123" can be in the binary. Is that a problem?

The concat! macro is working well for me, which is one reason (I guess) why rust does not have let t1 = "123"+"abc"; it is not needed. But it does need to be learned! That is my problem, not deficiencies in Rust

Expression like this is a sugar for a function call, and function may be called with something like &str or &'static str arguments, but not with a string literals, this semantic is only valid for things like macros. For this reason whoever bothers to implement std::ops::Add::add() for &'static str must consider exclusively the case when arguments are not known at compile time and can only rely on compiler optimizations to produce something better when they do.

Now, what function may produce when tasked to return the result of the addition of two static strings? It may, of course, allocate memory, put contents of two strings there, leak it and return pointer to a static string, but this is rather bad design choice: not only it will produce memory leak (not too much of a problem if it accepts only static strings), it will generate questions like either “why function only works on &'static str” or “why on earth authors of std produced something which leaks memory on a simple +”.

Or, in other words, letting let ti = "abc" + "123" produce what you think it should produce will either make std inconsistent and make it leak memory in some circumstances or it will produce lots of memory leak bugs in applications as &str + &str producing &str will be misused due to its convenience.

On the other hand, macros can accept string literals in place of &'static str without introducing any inconsistencies.

I think it is also possible for &str + &str to produce String, but I guess that authors of std though that this allocation would be rather unexpected.

1 Like

The problem is that string concatenation often results too many unnecessary intermediate allocations. Imagine the java code below:

String desc = "A: " + this.a + ", B: " + this.b + ", C: " + this.c + ...

In practice, the javac "optimize" such code to StringBuilder based version, which is really a String in Rust. And in Rust we decided to not apply such "optimization" and let user to decide when exactly they want to allocate.

Well intermediate allocations aren't the issue as the first plus creates a String not a &str and once you're adding to a String, rust knows how to reuse the allocation.

I do not understand that.
If the compiler sees: let t1 = "1123"+"aksh"; it is not a function call for the compiler to rewrite it as let t1 = "1123aksl"; It is part of the earliest parsing phase.

I am not a compiler designer, so perhaps I am missing something but ...

Actually, the concatenation almost certainly occurs after lexing and initial abstract syntax tree (AST) generation, and probably after macro expansion. That is still at an early stage in the compilation chain, long before trait resolution, etc., let alone code generation.

2 Likes

Simple case seems fine, but how about these cases?

let foo = "abc" + "def";
const ABC: &'static str = "abc";
const DEF: &'static str = "def";
let foo = ABC + DEF;
let abc = "abc";
let def = "def";
let foo = abc + def;
fn f(abc: &'static str, def: &'static str) {
    let foo = abc + def;
}
f("abc", "def");
fn f(abc: &str, def: &str) {
    let foo = abc + def;
}
f("abc", "def");

I don't expect any of them yields different code or compilability. If we special case something, every user of the language should learn and keep remember those special cases otherwise they'll get surprised on unexpected behavior.

3 Likes

Also, not to mention the fact that we may be embedding large amounts of data in our binary should we try this multiple times:

const FOO: &'static str = "abcdefghijklmnopqrstuvwxyz";
const BAR: &'static str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const BAZ: &'static str = r#"This is a very long text document! Here is the thing:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin ultricies tincidunt urna, vitae vehicula diam molestie vel. Cras a sem hendrerit, porttitor lacus nec, ultrices massa. Phasellus tincidunt augue non malesuada volutpat. Donec neque lorem, venenatis et venenatis nec, aliquet eu lectus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Nunc lacinia congue ipsum, et tincidunt ex tristique eu. In eu diam luctus, finibus elit sed, viverra orci. Nam non dui ante. Fusce feugiat efficitur quam non laoreet. Praesent ut fringilla felis. Sed lacinia orci in massa accumsan, eget ultrices lectus pulvinar.

Cras elementum nulla nec maximus gravida. Vivamus commodo posuere libero, vitae euismod lectus. Mauris sit amet fringilla neque. Aliquam commodo laoreet lacus vitae semper. Integer id mi maximus, bibendum urna vitae, fermentum erat. Sed odio eros, dapibus vitae hendrerit in, vehicula id massa. Morbi feugiat eros ut lobortis sodales. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Pellentesque eu metus ut sapien scelerisque pulvinar ac in diam. Donec fringilla, mauris lobortis condimentum condimentum, magna ex porta ex, sit amet consequat sapien velit in nisi. Pellentesque pharetra accumsan odio, quis facilisis elit sollicitudin nec. Ut eleifend, elit at tristique commodo, sapien mauris blandit tortor, at elementum tellus augue sit amet justo. Nullam pharetra accumsan velit, iaculis imperdiet ipsum venenatis sed. Curabitur risus augue, consequat et porttitor in, condimentum ac augue.
"#;
//Over here
let foo = FOO + BAZ;
let bar = BAR + BAZ;
let baz = BAZ + BAZ + bar + foo;

Uh-oh! We have lots of repeated memory statically in our binary, while we could have done that with the runtime memory and put it together piece by piece.

Also, another example that wasn't mentioned by @Hyeonu:

const BAR = "abcdefghijklmnopqrstuvwxyz";
let foo = &BAR[2..5] + &BAR[7..15] + &BAR[..] + &BAR[1..23];

What about slicing? Each of those &BAR[..] expressions is technically &'static str.

According to the semantics of the language it must be the same thing as std::ops::Add::add("1123", "aksh"): it is basically a definition of what + operator is. Anything else is breaking that semantics and, as people said earlier, producing special cases to remember. If that call was to produce &'static str "1123aksh" it would be possible to add joining to the earlier phase, but that would only mean that implementation of the + operation will have to live in different places and yet would need to stay consistent. There is not much reason for the compiler developers to bother themselves with maintaining that.

And each of those &BAR[..] expressions will be &'static str even if in place of numbers they wil have variables with values obtained from user input. This is why it is only logical to define + for &'static str as function which allocates and leaks memory, returning reference to it if you absolutely need that &'static str and not something more sensible.

1 Like

Note that + is already special-cased for primitive types, as evidenced by the fact that it works in const context. (In fact, the impl of Add::add for primitive integer types is probably self + rhs; I know it is like this for Deref impls.)

However, the proposal to give it special behavior at the lexer phase sounds pretty wild to me for little conceivable benefit (notably, it still couldn't concatenate named consts). I am reminded of C's macros that perform simple token substitutions with no regard for the language grammar or semantics.

(Also, it could change the meaning of macros that currently take an input like "a" + "b", as they would now presumably receive a single token)

3 Likes

Addressing this case in particular, would it work to do

        let t:&'static str = concat!(
            //0   1    2  3   4     5  6  7   8   9   10 11 12      13  14      15       
            "Lt( Prod( A 0.5 Prod( 0.1 A Neg( Lt( If( Lt( A 0.8 )Lt 3.0 5.0 )If 33.9 ",
            //         16  17              18   19  20  21             22
            ")Lt  )Neg 0.5 12.0 )Prod Sum( 10.0 2.0 0.5 1.5 )Sum )Prod 12 )Lt");