Somewhat related: I’ve been intrigued by serialisable continuations forever. The idea of machine and time independent control flow in a regular programming language could be quite useful eg:
Imagine an extremely robust job queue submission but also let you move computation through a network. Like Capnproto promise pipelines, you can resume computation on another computer, so rather than paying for round trips, you just move the computation through the network. You ask another computer to do something, and it handles the processing of the result on that machine.
I find Temporal [0] interesting for this reason and recently there was a HN post about Telescript.
I am designing a syntax for async pipelines that resembles a state machine that ideally would handle events occurring on different machines too but I'm specifically targeting multithreaded events with the goal of microservices. [1]
Time independence begs the question of failure recovery (and the associated at least once and at most once recovery basic options).
This kind of problem is more suited to solutions like 'closure-persisted lambda functions'. Flink's Statefun comes to mind [1], and also Restate [2]. These come with their host of problems, though. For example I hope you have a strategy for distributedly tracing these, as debugging an isolated continuation will have to be done.. in isolation.
Also comes the general question of lifetimes. What's GC when your functions are spread over a few kB in 100k different places in RAM, SSD, tape? Or do you go à la Rust, trying to come up with guarantees of them (and recover at least once these?)
I believe this is the future, though. Code is often way lighter than data, so why not move it around closer to data? Say a webserver routes a query, then makes a call to the DB, filters some data, and sends it back to the client. Why does the code not live in the NIC for routing, the filtering live in the SSD controller on another server, and the response is transferred from device buffers with direct copying?
Of course Loom's continuation can have a play in this, maybe be the underlying efficient serde mechanism?
There are initiatives to freeze whole JVMs, and bring them back in RAM when there's an HTTP call to be answered for example [3]. Maybe the path of performance for these is in only thawing the continuations and their dependencies, and blocking on >1h delays means getting frozen to the SSD.
Maybe in the future AWS is going to bill webserver lambda usage with SSD IOPS, RAM-L3-L2-L1 occupancy, NIC buffer usage granularity. Like Feynman said, there's plenty of (design) space at the bottom.
Low-level, DMA on PCIe stuff like GPUdirect and StorageDirect is a (limited, low-level, frustrating) form of disaggregated computing, if you think of DMA engines as tiny specific cores (and FPGA people have been using the PCIe bus and crazy 'DMAs' for this kind of stuff for a long time now).
On the high end you got your GPUs and accelerators, and recently all kinds of combinations of NIC, CPUs and GPUs, and on the extreme of the spectrum some of them also have ultra high bandwidth interconnect (mostly nvlink) that make the whole concept of disaggregated computing (in one host) more than viable and exciting.
It's just a big pain right now to program, synchronize, schedule all these async units and their async transfers, all that in a somewhat portable way.
> There are initiatives to freeze whole JVMs, and bring them back in RAM when there's an HTTP call to be answered for example [3].
There used to be a research JVM from a group called Velare that did exactly this. It was very cool. It was almost like watching 2 programs have a conversation with each other, with the back and forth of information passing.
var result = stmt.executeQuery("select from ...");
When the thread blocks on the DB, rather than bring data from a remote machine, the runtime could serialize the thread, send its state over the wire, and unblock it on the machine containing the data.
Somewhat related: I’ve been intrigued by serialisable continuations forever. The idea of machine and time independent control flow in a regular programming language could be quite useful eg: