Distributed double compare and swap

EDIT: Question has been rewritten for clarity. Sorry for confusion.

We have N machines (their id is specified by a u16).

Each machine maintains a HashMap<u64, u128> where key = u64, value = u128. Not being in the hashmap implies the value is 0.

Assume all the machines are in the same data center.

Now, I want to do a double-compare-and-swap:

input:

(k0_machine, k0_idx): (u16, u64)
(k0_old_value, k0_new_value): (u128, u128)

(k1_machine, k1_idx): (u16, u64)
(k1_old_value, k1_new_vaule): (u128, u128)

and the behavior is:

atomically {
  if (k0 has k0_old_value && k1 has k1_old_value) {
    update both
    Ok(())
  } else {
    Err("outdated values")
  }
}

Simple enough right? What is the best way to do this ?

===

Somewhat of a followup to Distributed, "pairwise transactional", HashMap<u128, u128> in Rust?

If the data doesn’t fit into a single atomic, I believe you might need a lock.

I agree that locks may be involved. However, the part that concerns me is the two CAS being on different machines. That's the nasty case.

Oh, I may have misunderstanding your question then, I didn't fully pick up on this being about distributed stuff.

Re-reading your question, I don't fully understand which machine has which values, and which is doing the update.

What is the expected behaviour for collisions? If k2 wants to engage k1 to update (k1_machine, k1_idx) while k0 is also engaged with k1 in updating (k1_machine, k1_idx) the system responds by ... what? Rejecting both transactions? Allowing a dirty commit? Allowing one commit and rejecting the other?

Transactions are allowed to fail. So if two transactions collide:

both try to write = incorrect behavior
exactly one goes through = ok
both gets rejected = eh, not great for throughput

I rewrote the question based on your feedback. Is this clearer ?

This notion of "which machine is doing which update" -- I think we get best throughput with "optimistic concurrency", which in my limited understanding is: everyone tries to write, and if we get collision: rollback / resolve.

It's not clear to me what are your requirements and things you can rely on, but looks like Byzantine fault - Wikipedia . Fundamentally there is no way to to have atomic updates between multiple networked nodes in the presence of possibility of connectivity failures, etc. without some pre-arranged scheme.

Generally you need to do some quorum/leader election like Raft (if machines are known beforehand), or probabilistic PoW-blockchain-like elections.

1 Like

If we look at the physical world, we can have:

  1. Alice buys something from the restaurant.
  2. Bob buys something from the office store.
  3. Cindy and Dan exchanges BTC for ETH.

and all of this happens concurrently, without a quorum / leader

In the physical world, transactions are not atomic. Alice gets to eat her meal before paying. The office store cashier could grab the product off the checkout counter after Bob has paid.

The consistency in these cases comes not from atomicity but observation over time: each party literally walking away from a transaction takes some time to do that and has the ability to go back and discuss revising it, and as seconds pass and they don't do that, both parties become increasingly confident that they are both satisfied with the transaction. And if there is actually not satisfaction but one party leaves (/refuses to interact further) then it becomes a separate refund, a matter for the courts, or a write-off.

7 Likes

There’s been a lot of thinking about this kind of failure mode in finance/business circles, especially regarding larger contracts. If you want to dive into the literature, the search term is counterparty risk.


In this case, for a large transaction, Cindy and Dan would usually use a trusted escrow agent to perform the exchange:

  1. Cindy sends BTC to escrow
  2. Dan sends ETH to Cindy
  3. Escrow sends BTC to Dan

If (2) doesn’t happen, escrow returns Cindy her BTC and nobody is left hanging.

There’s always the possibility that the escrow agent will run off with the BTC, which is why you need a reputable escrow agent: You want the loss of that reputation (in future fees) to greatly exceed the gain from stealing this one transaction.

Do you want the maps to be synchronized so that everyone has the same map? Or should each machine have a separate map?

What if one of the machines dies?

No, every machine stores a different map.

Transactions involving that machine gets rejected. Transactions not involving that machine continue to run.

What about the internet? We have billions of people, interacting with millions of websites; and there is no notion of physical proximity / walking away. And people can run transactions, despite there not being a centralized leader that approves every transaction.

You wrote “If we look at the physical world…”. I claim that, in those cases, atomicity is not the means by which we accomplish transactions — so they are not relevant examples to the problem of atomicity in distributed database transactions.

3 Likes

My goal is to find a counter example to this:

  1. My first attempt was to bring up how in the physical world, people can do transactions without a global quorum / leader.

  2. You pointed out a difference between the physical world and the digital world.

  3. I then constructed an example of a digital world where people can do transactions without a global quorum / leader.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.