Hacker News .hnnew | past | comments | ask | show | jobs | submit | arkonrad's commentslogin

Early stage with ARK Cloud API & Stateful LLM sessions without input token costs. Conversation history is kept server-side, so you only send new messages. Demo and docs: https://ark-labs.cloud/documentation

Looking for feedback on use cases and session controls (machine2machine).


A great map showing how governments/armies interfere with navigation systems — and the regions they target are no coincidence.


On cookies: we use an HTTP cookie (ark_session_id) purely as an opaque session identifier. The cookie is how the client ties subsequent requests to the same pinned session/worker/GPUs on the provider side so the provider can keep the model activations/state in GPU memory between calls. Not a magic for the model; it’s a routing key that enables true session affinity.

On “thinking steps” and contamination: good point - naively persisting raw chain-of-thought tokens can degrade outputs. ARKLABS Stateful approach is not a blanket “store everything” policy.

And my criticism targets higher-level provider practices: things like response caching, aggressive prompt-matching / deduplication heuristics, or systems that return previously generated outputs when a new prompt is “similar enough.” Those high-level caches absolutely can produce the behaviour I described - a subtle prompt change that nevertheless gets routed to a cached reply.

The platform has been launched — we’re collecting data, but early results are very promising: we’re seeing linear complexity, lower latency, and ~80% input-token savings. At the same time we’d love to hear more feedback on whether this approach could be useful in real-world projects.

And about going against the grain, as you mentioned at the end… well — if startups didn’t think differently from everyone else, what would be the point of being a startup?


Do you have your own platform to run inference?


I’ve been leaning more toward open-source LLMs lately. They’re not as hyper-optimized for performance, which actually makes them feel more like the old-school OpenAI chats-you could just talk to them. Now it’s like you barely finish typing and the model already force-feeds you an answer. Feels like these newer models are over-tuned and kind of lost that conversational flow.


Whisper Large V3 Turbo


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: