Distributed, "pairwise transactional", HashMap<u128, u128> in Rust?

anon80458984 · March 16, 2022, 11:38pm

We define 'pairwise transactional' for a HashMap<u128, u128> as follows:


pub struct Delta {
  key: u128,
  old_value: u128,
  new_value: u128
}

impl HashMap<u128, u128> {
  pub fn pairwise_transaction(&mut self, d1: Delta, d2: Delta ) -> Result<...> {
    if self.get(d1.key) != Some(d1.old_value) || self.get(d2.key) != Some(d2.old_value) {
    fail;
    }

    self.insert(d1.key, d1.new_value);
    self.insert(d2.key, d2.new_value);
  }

  pub fn read_keys(keys: Vec<u128>) -> Vec<u128> {
    input arg u128 is key; output u128 is value
  }

}

Here, "transactional" refers to: for any read of keys, it returns a consistent view, i.e. not halfway through a update. In the above case, the Rust borrow checker ensures that when a write occurs, we have a &mut self, which means there can not be another thread doing a read (needs a &self).

Question: Is there a easy way to make this work in a 'distributed' manner (i.e. the HashMap maintanied by multiple Rust programs running on multiple machines), or is this something that requires using databases for ?

RedDocMD · March 17, 2022, 4:59am

I am not sure what to make of this statement. Rust is a programming language. You can implement anything with it, if you know how to. You can make a distributed, transactional database in Rust.

scottmcm · March 17, 2022, 5:25am

The general answer to this question is "No, because of the CAP theorem - Wikipedia".

That hasn't stopped people from trying, but I don't know a single one that has worked well. See, for example, .NET Remoting, which is on the list of .NET Framework technologies unavailable on .NET 6+ | Microsoft Learn (also Breaking change: Remoting APIs are obsolete - .NET | Microsoft Learn ).

Hiding the distributed-ness of something is also mistake #3 in this article from way back in 2000: Three Wrong Ideas From Computer Science – Joel on Software

anon80458984 · March 17, 2022, 7:21am

@RedDocMD @scottmcm : There is an unintended confusion here. Let me rephrase this as:

I know this problem can be solved, i.e. just run distributed Postgres / Cockroachdb on a cluster.

My question is: is there a light weight way do to this (perhaps as a rust crate) or does solving this require a distributed transactional database.

2e71828 · March 17, 2022, 7:32am

Your problem is underspecified to be able to answer this question. What do you expect to happen, for instance, if node A attempts to insert (1,2) concurrently with node B inserting (1,3)? What behavior do you need in the face of a network partition? How much communication latency will exist between the nodes (typical and worst-case)? What read and write loads do you expect? Etc...

anon80458984 · March 17, 2022, 7:37am

If by 'insert', you mean

A does pairwise_transaction with keys 1, 2
B does pairwise_transaction with keys 1, 3

then I expect one to succeed, and one to fail. Which is fine, the client has the responsibility to retry.

Assume similar setup / expectations as a Postgres cluster running within a single AWS datacenter. (But hopefully higher throughput / possibly lower latency, as we are dealing with a 'more restricted' problem.)

2e71828 · March 17, 2022, 8:02am

The simplest solution here is to have a master process somewhere that serializes all updates, and either responds to queries or broadcasts accepted changes; that can be either Postgres or something bespoke. I doubt that this is what you have in mind, as most people wouldn't describe this as "distributed."

Dealing with distributed algorithms is generally a pain, so I try to avoid them whenever I can. What benefit do you hope to gain by using something "distributed" instead of having a single point of truth?

anon80458984 · March 17, 2022, 8:44am

100% agree.

Throughput, latency. A bunch of XY-problems / reductions ended up converging on the above primitive -- which is fairly well known, it is basically Double compare-and-swap - Wikipedia .

So the question can also be rephrased as: is there a rust crate for doing high performance double-compare-and-swap over multiple machines.

alice · March 17, 2022, 9:00am

No, there is not.

anon80458984 · March 17, 2022, 9:57am

Is this so easy that it is not worth putting in a crate, or so messy that no one has figured out a nice way to do it ?

2e71828 · March 17, 2022, 11:07am

The latter. There are a lot of questions that will have answers specific to the particular application, like:

How do you identify and find the cluster of machines?
How many connections should each machine maintain, and to which others?
How do processes join and leave the cluster gracefully?
What happens when a process stops responding? (This could indicate either a node failure or a communication failure, with no way to tell between them)

Because of this, solving the problem in a general-purpose library is significantly more complicated than for a single application.

alice · March 17, 2022, 11:09am

Well, I don't actually think anyone has tried, but even then, I don't think there's any known good way to do it.

anon80458984 · March 18, 2022, 9:09am

(post deleted by author)

Topic		Replies	Views
Distributed double compare and swap	15	1001	July 31, 2022
Rust, distributed hash table	8	1663	September 18, 2021
DistHashMap<[u8; 32], [u8; 1024]>?	1	566	March 15, 2022
Rust resource on distributed primitives?	0	305	March 4, 2023
Tutorial on nano-discord / distributed pub-sub in Rust?	5	1152	April 14, 2021

Distributed, "pairwise transactional", HashMap<u128, u128> in Rust?

Related topics