That's interesting. I have a separate cluster setup without archiving because so...

derefr · on Jan 28, 2023

Yeah, but I want cheap in-memory joins between the WAL-isolated datasets. I.e. "multi-world" MVCC concurrency, where a TX is locking in a separate min_xid for dataset A, B, C, etc. for the lifetime of the TX — but without this being a big distributed-systems vector-clocks problem, because it's all happening in a single process.

Why? Being able to run a physical replica that loads WAL from multiple primaries that each have independent data you want to work with, for one thing. (Yes, logical replication "solves" this problem — but then you can't offload index building to the primary, which is often half the point of a physical replication setup.)

gavinray · on Jan 29, 2023

I'm currently writing my own database from scratch and at the point where I have a buffer pool and storage layers, and can do scans with these

(no actual query layer obviously, these are raw heap-file scans)

I'm now studying, trying to implement a WAL and transactions + recovery.

Could you show some pseudocode for what you mean, because this sounds interesting and I _almost_ get what you mean, but not entirely.

derefr · on Jan 29, 2023

I don't know about pseudocode, but I think a simplifying analogy would be to consider an embedded DB (like SQLite) or key-value store (like LMDB) library that you interact with through transactions.

In such a system, "single-world MVCC" would be what you'd get by putting everything into one database file, with any changes intended always done within one DB tx of that single DB file.

"Multi-world MVCC", then, would be what you'd get by opening multiple database files, each of which maintains its own WAL/journal/etc., and then creating an application-layer abstraction that allows you to coordinate opening DB txs against multiple open DB files at once, holding the result as a single handle where if you hit a rollback on any constituent tx, then the application-layer logic guarantees that the other DB files' respective txs will be told to roll back as well; and that when you tell this coordinated-tx to commit, then it'll synchronously commit all the constituent DB files' txs before returning.

Note that unlike with a single DB file managed through a single readers-writer lock, this kind of system can introduce deadlocks (but DBs with more complex locking systems, like Postgres, already have that possibility.)

gavinray · on Jan 29, 2023

Oh I think I get what you mean.

Something like:

  interface IDatabase {
    wal: IWAL,
    transactionManager: ITransactionManager,
    // etc
  }

And then your application has multiple instances of these "IDatabase" objects, maybe one per physical/logical database file, e.g. ".sqlite3" in the case of SQLite

And at the root of your application, you have something like an:

  interface IDatabaseManager {
    databases: Set<IDatabase>
    transactions: Map<IDatabase, Set<Transaction>>
  }