Statefull service, with a web interface to cheat in

Haven't written for a long time here, sorry if I will write some history, I would like to explain the problem as best as possible.

In the past, when I got frustrated about other languages, mainly GC pauses, but also null reference exceptions, data races etc., I decided to give rust a try.
I purchased the rust book, read it twice, than downloaded the book from RIP tutorials to get some idea of what crates are being used, than I started to work.
I've written 3 projects in rust, namely Am1, sesman, and fscomm (whatever), all of them shared the success story of 2 weeks development, going into production immediately, running for weeks without downtime, and low memory usage.

These 8 weeks (3*2=8==maybe, unfortunately history repeats) were stressful, but the management was happy with the results.

But due to the common denominator of these programs, maintaining a central state (which is not of static size), with network sockets, child processes etc. and having some control from the outside, like listening on http, logging via UDP, raising events.

These issues were very hard to solve using rusts strictness to reference types, sync and send, and globals, this caused a very hard time with adding new features without restructuring the program, and to my pain, sesman went back to the old nodejs/gc version, fscomm went to go language (causing mysterious bugs from time to time, but faster due to green threads), and Am1 is also about to get outdated due to feature backlog.

Looking back, trying to rescue Am1, I understand how CLI applications could be streamlined in rust, I understand how stateless web services could be developed, on the other hand, how would such an application be developed in rust.
For example how would one implement Microsoft word/excel in rust?, how are the structs that automatically synchronize their field access called.
I must be missing some terminology, or paradigm here, otherwise would Pres. Biden mention it as the 1st choice replacement for C/C++.

My impression about rust was, the kind people who created rust, have also thought about the issues it makes, and created solutions (Arc, mpsc, Traits), so how is called the solution for these kinds of issues.

I would like to restructure these programs in a way that I should have the option to use any type in any component, pull back all 3 programs to their rust versions, and prove rust as an pro for maintainability.

Thanks

When you want to communicate or share data between multiple threads, you basically have two options: (I) shared memory or (II) message passing. From your prose I deduce you are looking for the former.

When you need to deal with shared memory—in any language—you have the basic requirement you can either have 1 writer XOR N readers. If you have two threads writing at the same time, you get UB. If you have one thread reading while another thread writes, you get UB. To avoid these race conditions, you restrict access to the shared memory region through a synchronization primitive (which is the word you are looking for I think). There are quite a few synchronization primitives available in Rust, like std::sync, parking_lot, or tokio::sync.

I was forced to use shared memory because of the socket handles

parking_lot is interesting, I was mainly relying on std::sync::Mutex<>.

My complication was on where to place the mutexes, around the whole state? around every individual field, what are about deeply nested fields.

I wonder whether there is a way to hide all syncing logic within a class (or struct impl), and than use that class from every other instance reference without having to worry about proper synchronization.==the class instance methods should block or defer the changes later in background.

If this is not idiomatic, what would be the proper approach?

I wonder whether there is a way to hide all syncing logic within a class (or struct impl), and than use that class from every other instance reference without having to worry about proper synchronization.==the class instance methods should block or defer the changes later in background.

It's good to structure things so as to reduce the amount of synchronization code that you have to write, because that leads to less boilerplate and less chance of bugs. But exactly how to do that depends on the actual kinds of state changes that occur and how they have to synchronize. For example, if you just “defer the changes later” then you may have to be able to resolve conflicts between multiple changes, later.

So, tell us more about what your application does — what state it actually shares, and how that state is manipulated — and we may be able to suggest a better pattern.

I was forced to use shared memory because of the socket handles

You mean you have to share state between all of the connection/request-handlers for each socket, right? Not that you're sharing the sockets?

When using shared memory and locks (Mutex, RwLock, etc) there is no single pattern that works for hiding the complexity of synchronization in all situations. It very much depends on how the state is used.

Can you share a specific situation where shared state was used, and you had problems with getting it to work, or maintaining it and adding new features? It's probably best to discuss one scenario at a time.

However, as you are probably aware, it will normally be faster to implement a program or a new features in golang or nodejs than in rust, especially if you are doing something you haven't tried before. Rust is slower to write, but the result is safer and faster.

I will mention 3 issues I remember.

  1. Keeping a VecDec<String> (Or kind of peekable Queue) of all errors generated, every error happened in the class should be pushed there, than the web-server should be able to view (tail -n n) n lines, or clear n lines. I've used an alias Amvs for Arc<Mutex<Vector<String>>>, and had to unlock and clone within the web-server code.

Maybe a RWMutex would be more ideal for performance so two web clients could read together.
I've also attempted to use MP for the writing part (for errors generated from outside, but since this required a blocking loop within the impl, I tried to enclose it in a thread, but then I had issues with moving, so I felt its not the right direction.

  1. Keeping an array of tuples of UDPsocket, and expiration time, on every major event notify all listeners, and have some internal GC that would periodically clean up the expired peers, (I wish this could be event based, but I wasn't able to figure it out.

  2. Enclose in the class a running child process, and writing to its stdin using the webserver. I also liked it to signal on exit, but the only way I found than was running in blocking in a thread, and pass a message on complete through a channel that would get it to the code that needs the event.

I know that I've thrown out here a bunch of separate problems, but I have a feeling that I was on the wrong structure, and the problems/corners were symptoms.

This reply is just about item 1.

You must mean VecDeque, not VecDec or Vector, right?

I'm still not clear about what the problem was, as I can't think of any serious problems with using the Amvs. Using a mutex for those three ops (add, view, clear) seems very straightforward.

Please correct me where I'm making incorrect assumptions.

  • I assume the three ops are not so frequent that there is a lot of contention on the mutex.
  • I assume the view op just copies out the N errors into a new Vec, and only holds the mutex while doing that.

Why did you need to clone the Amvs to implement those three ops?

By MP do you mean a mpsc queue? If so what was the motivation for using it? Was it because the frequency of errors was very high, so you were seeing contention on the mutex?

Aside: I assume you were using a fixed capacity VecDeque to limit memory usage. Did you also need rate limiting for the case where a single error is occurring very frequently (a common type of problem in production), since this would push all other errors out of the queue?

Yes VecDeque (unfortunately this is how it registered in my mind, in vim LSP would hint me, but nodeBB is still missing LSP).

I understand from your answer that the solution would be to make sure that the Mutex is unlocked most of the time, and so keep the possibility of hitting a locked Mutex to the minimum.

Given that I was waiving consistency for fastest response time, so I would like to buffer and defer the new data, and always read only the data that were already settled up to any recent point in time.

Also since I do not have the control over the error count, I didn't wanted to build a program that has a possibility of Mutex contention thus halting the webserver.

In fscomm for example, I held a cache of the latest config in memory, the config was constantly read be various threads, and they were very IO expensive to generate, I was using basic memoization, but I didn't liked the fact that once I started to update the new config I could read the old anymore, we could solve this by keeping 2 copies of the config, and update one while reading from the other, but In rust it would be hard to reason about it, since the ConfigMgr struct by itslef should be Arc/Mutexed for rust to allow cross thread pointer access

struct ConfigMgr {
config1 RWMutex<Config>
config2 RWMutex<Config>
use_config RWMutex<Int>
}

In the old dramatic languages we would keep as many mutexes as we want as siblings in the struct, and use them however it makes sense (or however it doesn't make sense, when being tired) rust is therefore much better but I wonder what is the alternative in case Mutex enclosures are effecting wait time. Is there a way how to mark a struct as send+sync using an manual decision, what would be the fire escape when the elevator is not working?

With clone I was referencing to cloning a reference of the Arc, my issues wasn't performance, but abstraction, I would like to keep the state in static lifetime without worry about references. lazy_static had its own limitations, my other option would be to declare my struct send, and use the methods directly.

This sounds like a fear that is probably unfounded. Pushing to a VecDeque with a fixed capacity (and therefore no allocation) is very cheap. With the addition of rate limiting, which is needed for the reasons I mentioned, I doubt this could be a problem. But if you think an mpsc queue is warranted and you want to explore the problems you were having, please show the code with errors and we can discuss that.

If the ConfigMgr is accessed via a Mutex, there is no point to using Mutexes for its fields or to using two configs. The entire structure can only be accessed by one thread at a time due to the outer Mutex.

Edit: If you want to create a new config while readers are concurrently accessing the old config, I strongly recommend the arc-swap crate for that. Using it for config information is a specific use case for the crate.

Enclosing the protected data in the Mutex (vs two separate fields at the same level in the struct, as with some other languages) does not impact performance at all. It is just a way to ensure that a Mutex is always used for the enclosed data and not for something else, allowing the compiler to check that access to that data by multiple threads is safe.

It is true that Rust does not make it convenient to have mutable statics, and this is intentional because mutable statics are inherently unsafe. So all you can do -- assuming a mutable static is the best solution to your problem -- is to become more familiar with how this is done in Rust and accept the inconvenience. There is no simpler way to do it. Note that in recent Rust versions you can use OnceLock in the std library rather than the lazy_static crate, although it is not any easier to use.

I took 2 days to rethink about what you've said and I think it could be summarized as correct, anti pattern isn't possible or recommended in rust, also that since the rust compiler anyway keeps you on track, it is OK to give up on readability and repetitive work for the sake of allowing control to the compiler.

1 Thing I am still wondering, and I believe there is some official answer to it, given the double mutexed struct above

struct ConfigMgr {
config1 RWMutex<Config>
config2 RWMutex<Config>
use_config RWMutex<Int>
}
fn read(&self){
let locked=self.use_config.read().unwrap();
return if locked==1 {
 self.config1.read.lock().unwrap();
} else {
self.config2.read.lock().unwrap();
}
}


fn write(&self){
let locked=self.use_config.read().unwrap();
return if locked==1 {
 self.config2.write.lock();
} else {
self.config1.write.lock();
}
}

//by some timer 10 times a second
fn toggle(&self){
let locked=self.use_config.read().unwrap();
if locked==1 {
 self.config2.write.lock().unwrap()=self.config1.read.unwrap().clone();
//unlock read
self.use_config.write().unwrap()=2;
self.use_config.
} else {
 self.config1.write.lock().unwrap()=self.config2.read.unwrap().clone();
//unlock read
self.use_config.write().unwrap()=1;
}
}

The above struct is perfectly thread safe, I would like to tell the rust compiler that he should enforce the synchronization in the Impl, and let me pass these structs between threads, without having to lock mutexes or enclosing the whole struct in a Mutex.

This thread will suffer of very little ready write crashes for the cost of consistency, and is more preferred than Mutex enclosures in many situations, I see disallowing any access to any field because of one field analogous to collective punishment. (for example locking all file systems, because one file system is being mutated).

Another solution would be to have an UnfairMutex, that every lock acquisition could be provided with a priority so we could give writes low priority in an Abstract way, we could also add a deadline when even low priorities should become equal priority like ZFS does with write delays.

To discuss the code you posted, it needs to be valid Rust code and formatted. I have attempted to convert it to valid code and posted it below. In the future please do this, and make sure the code at least compiles, unless of course you're asking about a compiler error.

use std::sync::{RwLock, RwLockReadGuard, RwLockWriteGuard};

#[derive(Copy, Clone)]
struct Config();

struct ConfigMgr {
    config1: RwLock<Config>,
    config2: RwLock<Config>,
    use_config: RwLock<u8>,
}
impl ConfigMgr {
    fn read(&self) -> RwLockReadGuard<'_, Config> {
        let locked = self.use_config.read().unwrap();
        return if *locked == 1 {
            self.config1.read().unwrap()
        } else {
            self.config2.read().unwrap()
        };
    }

    fn write(&self) -> RwLockWriteGuard<'_, Config> {
        let locked = self.use_config.read().unwrap();
        return if *locked == 1 {
            self.config2.write().unwrap()
        } else {
            self.config1.write().unwrap()
        };
    }

    //by some timer 10 times a second
    fn toggle(&self) {
        let locked = self.use_config.read().unwrap();
        if *locked == 1 {
            *self.config2.write().unwrap() =
                self.config1.read().unwrap().clone();
            //unlock read
            *self.use_config.write().unwrap() = 2;
            // incomplete: self.use_config.
        } else {
            *self.config1.write().unwrap() =
                self.config2.read().unwrap().clone();
            //unlock read
            *self.use_config.write().unwrap() = 1;
        }
    }
}

If this is what you intended, I believe it has logic errors and also that this approach won't accomplish what you want.

As a first step, please confirm that the above code is what you intended to write. I tried to change it as little as possible to get it to compile.

Apart from the question of whether the approach with the three RwLocks works and is accomplishing what you want, you can avoid the outer Mutex by just removing it. You can share an Arc<ConfigMgr> between threads and tasks.

The Send and Sync traits are automatically implemented for ConfigMgr because it is "entirely composed of types that are" Send and Sync, as described in the book here and here.

1 Like

Sorry for pasting pseudopseudocode, will be carefull next time.

Thanks for quoting the feature, that Sync is being implicitly implemented on structs that are entirly composed of Sync types.

I found it in the book, after repeating the chapters before and after, it makes now sense.

In the acual code my struct had a lot of fields, and I've never attempted to wrap every individual field in a mutex, basicly I missed the fact that rust has support for mutex per field, I was thinking that sync==enclosure==consolidation,

I found later some general discussions not related rust about the pros and cons of one mutex per object vs mutex per field, In my question I was thinking that in rust I am forced to use mutex per object.

Thanks for the reference.

Sorry for the late reply, I drafted a reply immediatly, but I am glad I didn't submit it then, and eventually understood it correctly.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.