Early stage with ARK Cloud API & Stateful LLM sessions without input token costs. Conversation history is kept server-side, so you only send new messages.
Demo and docs: https://ark-labs.cloud/documentation
Looking for feedback on use cases and session controls (machine2machine).
On cookies: we use an HTTP cookie (ark_session_id) purely as an opaque session identifier. The cookie is how the client ties subsequent requests to the same pinned session/worker/GPUs on the provider side so the provider can keep the model activations/state in GPU memory between calls. Not a magic for the model; it’s a routing key that enables true session affinity.
On “thinking steps” and contamination: good point - naively persisting raw chain-of-thought tokens can degrade outputs. ARKLABS Stateful approach is not a blanket “store everything” policy.
And my criticism targets higher-level provider practices: things like response caching, aggressive prompt-matching / deduplication heuristics, or systems that return previously generated outputs when a new prompt is “similar enough.” Those high-level caches absolutely can produce the behaviour I described - a subtle prompt change that nevertheless gets routed to a cached reply.
The platform has been launched — we’re collecting data, but early results are very promising: we’re seeing linear complexity, lower latency, and ~80% input-token savings. At the same time we’d love to hear more feedback on whether this approach could be useful in real-world projects.
And about going against the grain, as you mentioned at the end… well — if startups didn’t think differently from everyone else, what would be the point of being a startup?
I’ve been leaning more toward open-source LLMs lately. They’re not as hyper-optimized for performance, which actually makes them feel more like the old-school OpenAI chats-you could just talk to them. Now it’s like you barely finish typing and the model already force-feeds you an answer. Feels like these newer models are over-tuned and kind of lost that conversational flow.
Looking for feedback on use cases and session controls (machine2machine).