Static mutable String

OK, I'm new. I have to keep a string between calls to a function. The string may change in a call, and the new value is used in the following call. After many attempts to make it simple, I've ended up with this test program:

use lazy_static::{lazy_static, __Deref};
use std::sync::Mutex;

lazy_static! {
    static ref MY_STRING: Mutex<String> = Mutex::new(String::from("ABCDEF"));   
}

fn main() {
    fun();
    fun();
    fun();
}

fn fun() {
    let mut string = MY_STRING.lock().unwrap();
    println!("{}", string);
    if string.deref() == "ABCDEF" {
        *string = "Hello".to_string();
    }
    else if string.deref() == "Hello" {
        *string = "world".to_string();
    }
}

It works. I know I could use thread_local! with RefCell instead of lazy_static! with Mutex.

Is there no common, simple way to maintain a mutable string? In C, for example, I would simply add static to a string declaration, e.g., static char myString[100], and access the static mutable myString as any other character vector.

What you have is already a common, simple way of maintaining global mutable state.

Yeah, and you would spend hours or days debugging race conditions.

5 Likes

Simply declare a string in your main and pass it explicitly to your function. No mutex needed.

5 Likes

Thank you, of course, it makes it simple:

fn main() {
    let mut my_string:String = String::from("ABCDEF");
    fun(&mut my_string);
    fun(&mut my_string);
    fun(&mut my_string);
}

fn fun(instr: &mut String) {
    println!("{}", instr);
    if instr == "ABCDEF" {
        *instr = "Hello".to_string();
    }
    else if instr == "Hello" {
        *instr = "world".to_string();
    }
}

A minor drawback is that I need quite a few such strings and all are actually local to the function. But this design is simple.

The string I need is local to the function (and the function is not reentrant), so there are never race conditions. But Rust imposes a tricky mechanism to avoid collisions that never happen :frowning:

Create a struct that contains all those strings and pass that around.

5 Likes

You don't actually need the lazy_static! if you make the starting value be an empty string.

pub static MY_STRING: Mutex<String> = Mutex::new(String::new());   

fn main() {
    *MY_STRING.lock().unwrap() = String::from("ABCDEF");

    /* ... */
}
8 Likes

Passing the state in a parameter rather than using mutable global variables has advantages even in single-threaded context and is generally recommended.

For example, it makes it easier to test your function on different states independently.

6 Likes

Thank you. You could even improve it further by using:

if string.is_empty() { *string = "ABCDEF".to_string(); }

inside fun(). Then main() does not need to be aware of string.

I think I'll finally implement it using Mutex anyway. Then fun() will be self-contained, with no details spilling over into other modules.

This is yet another excellent reason for making it a local variable, rather than a global.

4 Likes

How about this:

fn main() {
    use foo_module::State;

    let mut state = State::new();
    state.fun();
    state.fun();
    state.fun();
}

mod foo_module {
    pub struct State {
        // Note: this is private to the module.
        string: String,
    }

    impl State {
        pub fn new() -> Self {
            State {
                string: "ABCDEF".into(),
            }
        }

        pub fn fun(&mut self) {
            println!("{}", self.string);
            if self.string == "ABCDEF" {
                self.string = "Hello".into();
            } else if self.string == "Hello" {
                self.string = "world".into();
            }
        }
    }
}

6 Likes

Well, one critical design decision is to keep fun() self-contained. The state, in this case State, is only of interest to fun(). Another detail is that main() and fun() will execute in different threads and are currently kept in different files. The best design I've found so far is using Mutex, with or without lazy_static!, or, alternatively, using RefCell and thread_local!. Using Mutex in a race-free function is heavy overkill, but after the comments from H2CO3 and alice I believe there is no simpler design.

The only way to avoid a runtime locking mechanism (or a thread_local!) is to prove at compile time that the function is not reentrant: the compiler can't just take your word for it. @tczajka's program does this by forcing the fun to take a guaranteed-unique &mut pointer to the state. Note that a Mutex only really has slow performance when threads have to wait for it to be unlocked; if it's uncontended, it takes only a single atomic compare-and-swap to lock it and an atomic store to unlock it.

7 Likes

Okay, this convinces me that the solution with Mutex will be fine. Actually, the overkill in this case is more code complication than execution time loss. This application is almost always inactive. It wakes up once every second, does its job within a couple of milliseconds, and falls asleep again. But I will have to unnecessarily expose 10–20 variables to Mutex, which affects the code, of course.

Even ignoring race conditions, shared mutability can cause several other issues in a language that lets you directly read and write to memory.

Others have explained this much better than I can, so I'll defer to them.

The Rust language designers chose the conservative approach by making people deal with these issues upfront rather than make you debug segfaults or bugs down the track.

Yes, it can be annoying to start with if you are used to languages which don't take this approach, but it pays for itself in reliability and maintainability - all these things add up to Rust's "if it compiles, it works" experience.

11 Likes

You have very strange definition of self-contained.

This, right here, is clarion call of layering violation. The fact that your function is not reentrant means that it's proper functioning depends on the caller.

There are many words that can be used to describe such function. Dangerous, fragile, error-prone, timing bomb and many-many others. Convenient, maybe.

But self-contained is most definitely not one of them.

Self-contained function is, by definition, function which doesn't depend on goodwill of the user of said function. sin is self-contained. cos is self-contained. But println! is not self-contained and what you are trying to achieve is not self-contained function, either.

Sometimes such function are a proper way to go (otherwise Rust wouldn't include println! in it's standard library), but most of the time they are bad choice.

So you have function in a multi-threaded program which is non-reentrant and which uses a global state?

Definitely dangerous or, maybe, a time bomb (depending on what you plan to do with it).

Still maybe a good design (depending on what exactly you are planning to do) but I would, most definitely, not plan a landmine by trying to remove Mutex from such design.

Why would you need to do that? Just put all the state in one global variable protected with one Mutex.

That's how Linux kernel worked for years and how CPython still works today.

You can add fine-grained licking later.

5 Likes

Yes, it does, and I am the caller. I'll call it once every minute, and it uses only a couple of milliseconds to run. There is no reentrancy. And since the only dependency on the caller is being periodically called, I consider this function self-contained. But it's OK with me if your definition of a self-contained function is different.

No, there is no global state. It handles its state internally. It's self-contained.

Yes, I could make a struct with all the long-living strings inside and have a Mutex on this struct. I'm not sure if it makes the code simpler, but I'll have a look at it.

This application already exists in C# for Windows and has been used for a long time without a single problem. The structure is roughly as follows:

MainWindow
{
	main()
	{
		start worker thread every 60s and run fun()
	}
	string variable1, variable2, variable3,...
	fun()
	{
		scan internet
		process data
		send result to UI through message queue
	}
	updateUi()
	{
		wait for message
		update UI
	}
}

Race conditions and Mutex are totally unthinkable. I don't even need static variables. In C#, you can nest functions inside other functions, which simplifies the design.

Well... Rust doesn't buy this line of explanation and for good reason: sooner or later you would forget how to use that code properly and there would be bugs.

Lol. When I was working on sysadmin I heard that so many times.

Everything worked perfectly till someone's cron-job would be stuck for some reason (usually network outage) and then 2nd one would wake up and unjam the first one. And then I had to go restore stuff from backups.

You have repeated “there are no reentracy” so many times, but how do you plan to guarantee that this function would never be called twice?

Rust's solution is mutex. Which is the right way 99% of time.

It doesn't matter if it has global state or not. It does have static state, though. That that is the problem.

Not the fact that your state is global, but the fact that it's not local. If your function wouldn't use static state then it would be reentrant. If it's not reentrant then it uses static state and have to provide some mechanics to make it safe.

Why couldn't it continue to exist ad C# for Windows code then?

It's generally a bad idea to try to write C# in Rust, JavaScrtipt in Rust or even Haskell in Rust.

Rust is built around some fundamental ideas and one of these ideas is idea of ownership: shared mutable state is inenvitable evil, it can not be killed entirely but it must be used as sparingly as possible.

You may like that idea or hate it, but you would find very few volunteers which would be willing to rip out one of few cornerstones Rust is built upon for the dubious ability to write C# in Rust.

Yes. But it's always about trade-offs. When you simplify the initial design you make it more flexible but then it's harder to find and fix all bugs.

Rust just picks different point on that scale: it's often harder to start project in Python than in Rust but it's harder to finish (when finish means all bugs are closed). C# is somewhere in the middle.

3 Likes

From what I'm observing race conditions are more-or-less guaranteed if “scan internet” or “process data” stages would be jammed for some reason.

And all your variables here are de-facto static.

2 Likes

In my solution, the contents of State are private, so they are only visible to fun. The fact that there is some state being managed by fun() isn't really a secret to the caller, otherwise why would they call the function multiple times?

Perhaps the naming is confusing: I used a generic State because I had no idea what your code was doing. What if you rename State to something specific, like InternetCrawler? Doesn't that make sense then? It makes sense to create an instance of an InternetCrawler if you're going to crawl the internet periodically (or whatever the code really is doing).

I don't really see the relevance of this. In your original code it was main() that was calling fun(), it isn't how it works, but some function is calling fun() periodically and can own the state.

4 Likes