Apple container is more akin to a replacement for docker or colima (although patterned more like Kata containers where each container is a separate vm as opposed to a bunch of containers in a single vm). It's a promising project (and nice to see Apple employees work to improve containers on macOS).
Hopefully, they can work towards being (1) more docker api compatible and (2) making it more composable. I wrote up https://github.com/apple/container/discussions/323 for more details on the limitations therein.
Originally, I planned to built shai to work really well on top of apple container but ultimately gave up because of the packaging issues.
I'm one of the creators of shai. Thanks for the callout!
Interesting to see the work on Yolobox and in this space generally.
The pattern we've seen as agent use grows is being thoughtful about what different agents get access to. One needs to start setting guardrails. Agents will break all kind of normal boundaries to try to satisfy the user. Sometimes that is useful. Sometimes it's problematic. (For example, most devs have a bunch of credentials in their local env. One wants to be careful of which of those agents can use to do things).
For rw of current directory, shai allows that via `shai -rw .` For starting as an alternative user, `shai -u root`.
Shai definitely does have the attitude that you have to opt into access as opposed to allowing by default. One of the things we try to focus on is composability: different contexts likely need different resources and shai's config. The expectation is .shai/config.yaml is something committed to the repo and shared across developers.
Arrow is all about in-memory, not long-term persistence. Systems can write it to disk but it is the one in-memory representation to rule them all, not storage/disk. Disk has its own requirements and challenges outside the scope of Arrow.
It's more than an implementation detail because we're also targeting interoperability between multiple separate technologies. One of the key things that the article didn't fully cover is that Arrow serves two purposes: high performance processing and interoperability.
A key part of the vision is: two systems can share a representation of data to avoid serialization/deserialization overhead (and potentially copying in a shared memory environment). This is only possible if the in-memory format is also highly efficient for processing. This allows the processing systems (say Pandas and Dremio) to share a representation, both process against it, and then move the data between each other with zero overhead.
If you shared the data representation on the wire and then each application had to transform it to a better structure for processing, you're still paying for a form of ser/deser. By using Arrow for both processing and interoperation you benefit from near-no-cost movement between systems and also a highly efficient representation to process data (including some tools to get you started in the form of the Arrow libraries).
One quick note to make on this. Kudu is a storage implementation, (similar to Parquet in some ways). Arrow isn't about persistence and is actually built to be complementary to both Kudu and Parquet.
Also note: Kudu is a distributed process. Arrow and Parquet are libraries that can be embedded into your existing applications.
You nailed it. The most exciting part about all of this is being able to move between a "data science" context and a "database" context (and back again) without pain or penalty.
Hopefully, they can work towards being (1) more docker api compatible and (2) making it more composable. I wrote up https://github.com/apple/container/discussions/323 for more details on the limitations therein.
Originally, I planned to built shai to work really well on top of apple container but ultimately gave up because of the packaging issues.