The core ledger system lost committed transactions by choosing not to write them...

Mleekko · on Feb 5, 2022

Yeah, the issue is serious but at the same time: "This problem occurred only in cases where every node was killed at roughly the same time". And there are 100 nodes on the network. (and it is fixed now)

NelsonMinar · on Feb 5, 2022

Well, Radix says it is fixed now. That's the gist of their response to this report, "we fixed a lot of things and no one's tested the new code but trust us!"

Mleekko · on Feb 5, 2022

nope, if you read it one more time, for this particular issue Jepsen confirms the fix.

Too · on Feb 6, 2022

Isn't this is a rather common configuration for a distributed DB? Given one node dies before flush, you trust that the remaining nodes in the system will not die at the same time and live long enough to flush to their respective disks. It's a gamble yes, but depending on your environment the risk can be smaller than the benefits. For a blockchain ledger, you might want to choose the safer corner of CAP in that equation.

zbentley · on Feb 7, 2022

> Isn't this is a rather common configuration for a distributed DB?

In my experience it is not. Rather, aggressive fsyncing/O_DIRECT usage are common. The rationale for this is usually partition risk: better to durably log a write before propagating it than to potentially fail in propagating it and then be left in a position of having to either reactively fsync or hope that automatic flush-to-disk will persist your unexpectedly-sole possession of that update.