HN2new | past | comments | ask | show | jobs | submit | sensodine's commentslogin

> it leaves me completely baffled about what s2 does, what problem s2 is trying to solve, or who the intended audience of s2 is

Regarding S2 generally (not just s2-lite), the intent behind it is to turn the core data structure from streaming platforms (like Kafka) into a serverless primitive -- kinda similar to what object storage did for file storage.

So if you are already in the world of working with streaming platforms, S2 gives you a simpler API, bottomless storage (S2 itself uses object storage for durability), and no limits on the quantity of streams you can create and work with. Streams also all have URIs and are directly accessible over REST with granular access controls.

This enables new types of patterns for working with streams, other than just the traditional ones where people typically reach for streaming platforms (like CDC, ETL pipelines, etc). An agent can have its own stream to serialize state onto, for instance; you can use a stream as a durable transport layer -- e.g., you want to reliably provide a flow of data (tokens from a model, financial ticker data, etc) to a user and allow them to resume from exactly where they left off if they are disconnected, for instance; you could use streams as a durable ingest buffer, for collecting data that will eventually reside in an OLAP like Clickhouse.


I'm a huge fan of Asciinema, and this is awesome work.

Regarding the live streaming feature, I hacked a similar thing on top of s2.dev streams[1] (disclaimer: I am a co-founder), which could alleviate the need for a relay in this architecture[2]. Naturally showing off `btop` was the highlight for me as well :-D.

[1]: https://s2.dev/blog/s2-term [2]: https://docs.asciinema.org/manual/server/streaming/#architec...


(I work at S2.)

> what are some of the notable issues that were find using the DST approach

We've discovered a few distributed deadlocks. And in general it's been incredibly helpful in exercising any parts of the system that involve caches or eventual consistency, as these can be really hard to reason about otherwise.

> if a LLM system would be able to help analyze the TRACE logs

Neat idea! For us, the logs are typically being dug into only if there is a failure condition for the test as a whole. Often times we'll inject additional logging or state monitoring to better understand what led to the failure (which is easy enough to do given the reproducibility of the failure in the sim). Trace logs are also being analyzed in the context of the "meta-test", but that's just looking for identical outputs. (More about that here: https://github.com/tokio-rs/turmoil/issues/19#issuecomment-2... )


(S2 team member here)

> I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?

This is how it works essentially, yes. Architecting the system so that chunks that are written to object storage (before we acknowledge a write) are multi-tenant, and contain records from different streams, lets us write frequently while still targeting ideal (w/r/t price and performance) blob sizes for S3 standard and express puts respectively.


Wait, data from multiple tenants is stored in the same place. Do you have per-tenant encryption key, or how else are you ensuring no bugs allow tenants to read others data?


(Founder) We will be using authenticated encryption with per-basin (our term for bucket) or per-stream keys, but we don't have this yet. This is noted on https://s2.dev/docs/security#encryption


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: