Can we write a lock async without unsafe?

Hi! I was thinking to try write some async code for postgres using pgrx, how PG/PGRX request the element to be copy, I'll be using an array in this example.

In async code, everything can be called in any time, but I have some issues with the normal lock functions.

We have a array with a struct, the struct can need change it self:


struct Foo;

impl Foo {
        fn read_something(&self) {
                todo!()
        }
        fn change_something(&mut self) {
                todo!()
        }
}

struct Arr([Option<Foo>; 10]);

impl Arr {
        fn try_add_element(&mut self, foo: Foo) -> Result<(), Box<dyn std::error::Error>> {
                todo!()
        }
        fn try_rm_element(&mut self, foo: Foo) -> Result<(), Box<dyn std::error::Error>> {
                todo!()
        }
        fn read_element(&self, id: usize) -> Result<&Foo, Box<dyn std::error::Error>> {
                todo!()
        }
        fn mut_element(&mut self, id: usize) -> Result<&mut Foo, Box<dyn std::error::Error>> {
                todo!()
        }
}

The basic design is that thing, we have a box that stores Foo, and then we can request different things inside.

I'm writting this as a test, I have never done this and why not use the normal locks?

Imagine, a section already requested Foo1, now we want to add a new Foo, in normal apps this would request a exclusive access to Arr, but if we only want to add something we does not need a full exclusive, we only a instance where add a new element is exclusive.

So, while we want to add a new Foo, we should be able to keep working normally with all elements without block everything for that operation, obvs if we want to remove we should request a complete exclusive access.

Split the read, add and remove elements in Arr allow to most inside elements keep working without lock everything every single time.

The issue? I can't found how to design this without unsafe code... when I go and look on other crates and lib, is like internally all with unsafe code, so I was asking myself if there is a way to develop async mechanism without unsafe code! maybe this always needed them, mutex seems to also uses unsafe code.

I'm giving a lot of context, to just to avoid misunderstandings :slight_smile:

As other notes, I have read but I still do not get very well how the traits Sync and Send works not how interacts across threads, I also tried to look if there was a simple of example for example for a Mutex but I could not find.

The general name for a data structure that allows this kind of access is a concurrent data structure.

Writing a concurrent data structure from scratch often involves unsafe code, because the borrow checker cannot tell whether your inter-thread synchronization strategy for avoiding conflicting accesses is correct. This is not universally true; for example, you could write a simple sharded map (like a very simple version of dashmap) with just the data structure

struct MyMap<K, V> {
    shards: [Mutex<HashMap<K, V>>; 16],
}

by following some fixed rule about which keys go in which shards. Obviously this has some disadvantages, but it would increase concurrency without introducing any new unsafe code.

If you decide to write a concurrent data structure using unsafe — I would recommend not doing so until you have some more experience in nearby things, because it is a very tricky topic — then follow these principles:

  • Write thorough tests for the unsafe code, and run them under Miri.
  • Don’t mix your application logic with your unsafe code. Make the concurrent data structure a generic type even if you only use it with one set of types. This way, you're less likely to introduce a safety bug because the boundary between the unsafe module and the rest of the program is more clearly defined.

All Rust programs that do anything at all[1] call some unsafe code somewhere. Do not judge a program by whether it calls any code containing unsafe {...}; judge it by the unsafe code it contains. The thing to be extremely careful about is about writing new unsafe code.


  1. well, except for a library that contains very simple functions and is loaded and called by a non-Rust program ↩︎

3 Likes

thx for all the info! is there any docs about how to start writing concurrent data structure? I can't found much.

Usually you would find a concurrent data structure that has already been invented that suits your needs, then look for a Rust library containing an implementation of that data structure. It's a tricky topic which you should definitely not tackle from scratch if it's not your primary goal.

If you want to learn anyway, a place to start might be Rust Atomics and Locks: Low-Level Concurrency in Practice by Mara Bos. That’s not about data structures but it is about the concurrency primitives that get used in implementations of concurrent data structures, and it would give you a place to start in understanding what is possible.

If you just mean you want to start using a concurrent data structure in your application, well, I already mentioned dashmap; it's a good place to start if your goals are just to increase the throughput of your application by allowing some simultaneous accesses (not to guarantee simultaneous access to any two keys).

1 Like

Thx! I'll be loocking on that!

In usual circumstances use a normal Rust crate would be enough, the issue here is a mix on Postgres, which needs to the struct to be shared be Copy.... a mix of that + how to allow access the keys without lock the entire system, in a db there will be requests from all places to access/modify data in values of the object (thinking of it as a hashmap).

Would be idea find a crate that exists and match this requirements.

That sounds like what you need is not adding a concurrent data structure at all (a database is built on concurrent data structures); what you need is to make use of database transactions to ensure that only consistent changes are made to the database.

In Postgres you can make extensions, and some extensions can have internal dependencies to make results, one transaction where it updates two rows, extensions can need the value of both values to be sure will not make a wrong result.

The extension can also, depends on its nature lock different things on different tables, when we need this details the extension must use a shared component that checks this behaviors and validate all will be right.

Also some extensions can change the result depending on the order, so execute first A to B, is different from B to A, in this scenario you want a lock where PG needs to wait to finish one to then process otherone.

I think if PG has already a feature to handle this things, can be easier than write some code here, but some locks can be complex and PG would not support them.