Hacker News .hnnew | past | comments | ask | show | jobs | submit | jedberg's commentslogin

This is a good talk. Really gets into the details of how things differ from the classical SaaS or consumer product.

I've been doing reliability for most of my career, and have always been able to hide behind, "We're not a bank, if we lose a few requests it doesn't matter". They can't do that. :)

One advantage that they have is that the market closes, so they can do maintenance that takes the whole system down, but when you're running a global consumer product, it's a lot harder to do that without pushback.

So for most of us, our stress is around zero downtime maintenance, and theirs is around never dropping a request when the system is live.


Yeah, I work on systems with reliability requirements like this at a large bank.

There are multiple layers of controls and manual interventions and things, which while absolutely painful, slow, expensive and shitstorm-conjuring -- are ultimately the final authority on some failures.

For e.g, in payments -- every single settlement or clearing anomaly is looked at by a real human, and rectified/rebooked manually.

So, yeah, the stakes can be really high when you have a couple billion in memory on your server, but -- it's just a system.

And it will fail, and we plan for it to do so.


Not sure what the practical difference is (24/7 vs ~10/5) except for the convenience when planning data migrations if you have regularly planned downtime.

For most code changes being turned off at night isn't much of an advantage, as the new code will need to go live at some point and that point is where the risk is. For systems on 24/7 you simply need a copy of your production environment to test on, a.k.a. staging.

The main thing about 24/7 is needing follow-sun SRE and/or out of hours oncall.


there’s a move now towards 24/7 trading. I guess we’ll see how the rigors of the trading environment mesh with zero down time. I’m sure the rollout will be slow and steady.

I've seen that. I suspect the exchanges will never go for it for this exact reason -- they need downtime for maintenance. But if does go through, it will be a fun challenge to get 100% uptime!

I've always said that with infinite money we could get 100% uptime, but no one has infinite money. Trading firms are about as close as I can imagine to infinite money though.


How do you think on-chain exchanges do it? Hyperliquid has 16 employees, not engineers...total employees. It is possible, it isn't going to be possible for many of the legacy exchanges.

I work with a major one and, being honest, from day one it was obvious they were incompetent. They employ a huge number of engineers and are unable to deliver basic features at any reasonable pace. Not even remotely close to it either (as in: you ask them to do something, they say yes, execs say yes, you get a deadline, date comes...deployment difficulties, environment not working, run around goes on and on forever).

I remember the CEO got on a call with us at the start and was slapping himself on the back saying they had no downtime...because they were able to do maintenance when markets shut (and have heard very bad things about how that goes). But it is 24/7 world now, our service is up 24/7 and, of course, this led to massive issues in time due to the very different expectations around delivery/quality. Our execs were impressed, our engineers said this was a bad sign. And, ofc, it transpired that they were total amateurs (to be clear, this is one of the biggest exchanges in the world) and were unable to deliver.

To come back to my original statement: there is a company of 16 people total who is, from the point of view of customers, delivering features faster. It is difficult to understate how insane that is.


From what I've seen, on-chain transaction times are measured in seconds and minutes, not milliseconds. It's a lot easier when you have time to wait to process a queue.

Fastest ones are processing a block every 10ms.

It depends what you mean by easy. Even if you are using a slow chain, you still have to compete for finite block space, you still have to work out how to risk/matching fast, etc.

With chains built for exchange use, operating them easier, that is why they don't require thousands of engineers. But the actual technical capability of the system is significantly in excess of tradfi exchanges. For example, risk function is real-time on-chain as opposed to EoD settlement. This significantly changes the possible feature set. Once you have built it, it is very easy...the question is why big exchanges rely so heavily on eod processes? The answer is: they are bad at engineering.


The EOD reconciliation (and corresponding inability to settle a position in milliseconds) is a feature - it allows "obvious erroneous trade" roll-back mechanisms, etc.

Very few people want the financial system to be a contractual suicide pact - they want it to be predictable, but when the unpredictable happens - they want the retail and institutional investor to be protected (the HFT players can go beat each other up - no one will really cry about them). And unpredictable can be anything from a power event taking out multiple exchanges in the NJ triangle (Sandy hurricane) to a cyber-attack (never happened yet) to a flash-crash driven by algorithms from multiple HFT driving each other nuts (happened at least once).

So, it is not EOD processes as such, but the ability to pause, assess the entire system holistically, and then correct it before it blows up the portfolios of everyone holding a 401k. So even though the exchanges _could_ got to 24/7 trading, I'd be surprised if we just went away from cyclical 24-hr based windows of settlement.


Right, but EOD also introduces credit risk on the clearing house/bilateral.

Also, I would say that probabilistic finality is one of the main issues that tradfi has with crypto (which also exists in the case of margined exchanges, for the reasons you mention). Market participants expect trades to be final, the idea that they can be rolled back is extremely unattractive.

The reason you don't need to stop the market in crypto is because you don't have EOD reconcliation. If everything settles immediately and the risk engine can keep up with the market then there is no credit risk (there have also been multiple solutions to this problem in crypto, none of them involved waiting until the end of the day to see what happens when they try to cross everyone). The reason they have market halts is to limit credit risk from the market moving in one direction and winners being unable to recover gains from the losers. It is fair to say crypto DEX haven't solved this with ADLs but they start from a better place and the higher level of competition means that innovation to invent new solutions is actually happening. The reason exchanges have shit tech is because there is no competition.

I feel like your comment is baiting because you surely know what happened at LME with trades getting cancelled because they would have caused LME insiders to lose money. Hunt Brothers caused massive issues for clearing house, HK government had to bail out clearing house...there are massive issues with the current system.


> they want the retail and institutional investor to be protected

That costs money, indirectly


But it does allow these investors to participate in the markets without losing their shirts - and the lack of such liquidity would impact the market more so than the cost of the risk mitigation - which as you completely correctly noted is not free - both in first and second order terms.

Hyperliquid while big in crypto is still small compared to mainstream financial markets.

I don’t think you have made a case for anything yet.


So when the iphone came out and no-one was using it, it wasn't useful? I cannot conceive a reality where I perceived the current state of things as the only things that could ever exist. I do not believe anyone could have that amount of self-regard, it is impossible.

HL has been tested up to $8bn/day in volume. The gap in resources is several orders of magnitude so if big exchanges were doing 1000x more volume it wouldn't matter because HL is literally running with a handful of engineers vs thousands. In reality, HL is doing 25-40% of CME, for example.


so let me get this straight Skippy, you're saying you got better performance and reliability than the LMAX disruptor with Multicast that runs inside many big exchanges?

I have a really hard time believing something decentralized will surpass the the physical limitations of speed of light and low level assembler from C++ optimizations without any GC

also the fact that hyperliquid sequencing of orders is opaque and not opensource, and there is indeed latency in the consensus, I cannot believe yet there are p99 stability in completed transactions


why do you think LMAX disruptor runs inside many big exchanges? have you read some blogs online? LMAX disruptor does not run at "many big exchanges".

Disruptor is written in Java. LMAX itself is a tiny exchange that largely deals with institutions so doesn't have the same message volume as retail facing.

What language do you think validators run? So much stuff that makes absolutely no sense..

Order sequencing is not opaque, that is one of the primary benefits of DEX that aren't decentralized. SOL has issues with sequencing but there are attempts to fix this (Jane Street is one of the places working on this). This is a point that I made above about block space, and isn't relevant for Hyperliquid.


The first 90% of features takes 10% of the time to deliver. You are comparing capital infrastructure markets with deep regulatory obligations and multiple stateful interfaces (OUCh/FIX) to retail focused matching engines with a very slim stateless protocol surface (REST).

An amusing, moderately expensive solution that might actually work would be to have a weekday system and a weekend system. Think of it as a spare D/R system that you intentionally swap twice a week :)

If done right, it would be a complete separate system. Separate IP addresses and all.


That's effectively time-based request sharding which seems sensible but you'd still have to reconcile trades and any open positions (etc) across the time boundary where one system stops accepting requests and the other one starts. And keep the databases synchronous (ie have some system to make sure they're in sync at the changeover time) - or have a few minutes/hours of downtime between weekends and weekdays while you copy the whole production database from one system to another. The devil is in the details!

For what it’s worth, in some financial markets, there is a sort of natural daily cutover time [0] across which you are often not trading quite the same instrument. For example, the settlement date may roll over, etc. And a lot of Very Serious Finance is already built on the idea that most parties do not instantaneously reconcile anything and don’t depend on real-time trade lists.

I really can imagine a system in which the Monday trading system runs all day and then turns off at a predetermined time. Then it has 15 minutes to produce and disseminate a final list of all transactions, after which it becomes completely unavailable and is ready for maintenance. Any subsequent amendment to Monday’s trading would be done out of band. Open orders at the end of Monday do not carry over immediately to Tuesday, although front ends are welcome to recreate them. Everyone would understand that liquidity would be thin for the first few seconds after the system rolls over.

For added fun, Monday and Tuesday could actually be allowed to overlap in a hypothetical trading system, although the market participants might not love this.

[0] which is not the same for all instruments, and holidays mean that not every instrument rolls over meaningfully every day.


Heh, maybe they'll develop a sudden interest in the old Vax VMS clustering approach? ;)

I hated my time as an SRE. But … can’t it be done with some combination of canaries and blue green deployments and extensive testing? Where when things look good you just swap all the traffic to the good stuff keeping the rollback hot etc etc?

That's how we got 99.99% at Netflix. And it cost a lot of money. But a canary implies that something may go wrong and you have to roll back. The canary is still production traffic, so some transactions would fail, which isn't allowed for this kind of workload.

I image you'd have to use shadow execution, where you roll out a full second copy, run every transaction through both, and compare the results. And then, only after a certain time, switch traffic to the new infra and tear down the old.

But you would need a ton of extra hardware (more than double) and a lot of ways to keep data in sync. And of course if you put an LLM or other non-deterministic system in there, that's a whole other can of worms.

Like I said, a fun problem to solve. :)


Folks that keep the lights on 24/7 aka SREs are super heroes that wear capes. Thank you for your service.

I couldn’t do it. I like infra and all but it’s just not my cup of tea. Def true that in a trading pov the trade must be executed. It must settle. It must work. Or capital flight will be huge.


There are different kinds of updates that influence options and feasibility. Keeping in mind that deep in the heart of an exchange is a single threaded process, the sequencer. Therefore, you have three layers, external facing protocols, sequencer/matching engines, and internal interfaces. Internal interfaces are the easiest for b/g. External protocols, any change worth its weight changes the protocol and therefore requires participants to change their codebases too. Versioning protocols is an option, but still the integration with consumers is much more transparent and usually you have them test on pre-prod environments, occasionally also requiring attestation and conformance testing (regulated markets). Sequencer and matching engine are at the core. You could do parallel runs but not b/g. Theoretically you could abstract the matching engine and keep a barebones sequencer immutable, but this will have performance implications. So yes, you can do things, but not in a completely transparent way, unless if you introduce an “upgrade jitter” to give you a window for transparent upgrades. It’s an interesting domain, I think people will just accept occasional downtimes as a better option than constant jitter cost.

Sports betting exchanges have been doing that for a very long time. There is never a good time to take the system down for maintenance - event settlements happen every few minutes, and live games with in-play betting are going on somewhere in the world at any given time.

Makes things damn hard indeed, because you have to truly learn asynchronicity, CQRS and complex live migrations. (Incidentally, engineers who have worked on such systems tend to be over-represented in extreme HA businesses.)


Only US. Other markets barely have liquidity during daytime and get most liquidity in opening and closing auctions. Maintenance periods are actually a complication. A few more state transitions for the system, but barely used for maintenance. The only value is for upgrades, which would still be scheduled with the market down and systems up, as participants also need to transition codebases for breaking changes, a test weekend or more is required etc. These systems are extremely resilient. You most often get an incident not because the system is down but because the latency profile has changed by a few ms.

Not really US only. LMAX is 24/7 and is a UK company, famous on HN for open sourcing their ring buffer.

Crypto trading has been 24/7 since it began.


Ok, I will make it mostly US. Maybe a couple more markets, London and Tokyo? Futures may be a bit broader adoption too. Vast majority won’t move to 24/7. Crypto is a different game for the time being at least. It has its own challenges but also escapes quite a few of traditional exchange complexities.

> there’s a move now towards 24/7 trading.

Isn't the plan more like 23/5 like is already the case for several markets?

I can't see the standard sessions moving more 9:30am/4pm weekdays to 24/7. I take it they'd still let, at least, one hour off for technical reasons.

If I'm not mistaken it's the reason several markets are 23/5 and not 24/5: that one hour of downtime is basically for servers/maintenance right? (maybe someone can chime in)

P.S: I take it technically there's 24/7 trading already seen that cryptocurrencies exchanges are opened 24/7 (I'm not sure: but I think that's the case) but I don't think those do anywhere near the volume of, say, options trading on equities during standard sessions (40 Gbit/s with peak over 70 Gbit/s for the full options feed).


The 23/7 is not so much for maintenance as to have a defined window for changes to the market to happen.

Every so often a new stock is listed or a stock ticker is changed or a stock is split, etc. There are smaller changes every single day, like to the settlement date of your trade.

It's very convenient to be able to restart all your systems at 5pm, have them all load the updated reference data, and start them again in time for 6pm (or 7pm, or 4am tomorrow...). Even if you trade stocks and options and currencies and futures all over the world, a quirk of the calendar means they're basically all closed between 4 and 5pm Chicago time.

Of course it's possible in principle to build systems where all this is dynamic and you can seamlessly trade with the old configuration at 4:59:59.999 and start trading the new one a millisecond later. But literally everyone has built systems that don't work on this, that rely on being able to chunk the continuous passage of time into discrete days. It would be painful to rearchitect them all now.


Crypto exchanges have been doing 24/7 trading for well over a decade. Of course they’ve had their own, uh, issues - but generally, reliability of transactions hasn’t been a major one, for the big exchanges.

Obviously there is a huge trend of "rewrite X in Rust". I understand why, Rust is a huge improvement in safety and speed.

My question is, to people even older than me (and I'm certainly not young), does anyone remember this much enthusiasm about people rewriting C code into (C++/Java/Whatever was new and hot)? Because I don't, but maybe I missed it.


I recall C++ OOP being the new hotness when I started out and C was always contrasted as the old & busted example. Kind of the "Everything-as-an-object will simplify everything" phase. Windows MFC was the new way, then STL.

Java WORA write once, run anywhere was definitely a thing when it came out. Java Applets came out of the woodwork and were the WASM of their day. Even Cisco ran Java for their router UI for a while, which was painful.

More recently, HN went through a period about 10 years ago where every other article ended in " ... written in Go".

The mantra may not have rhymed with "rewrite X in Y" but the spirit was there.


> every other article ended in " ... written in Go"

What happened to that: is Go no longer considered great / popular?


In the circles where I hang out I think community opinion is that go is _fine_, but python has faster iteration speed for experiments, and rust has better correctness and performance for production, so there's less excitement around it

Kind of the opposite, I was deep in the R world a decade ago and there was a huge trend of replacing Java dependencies with C/++ ones because the JVM was such a pain to manage. The community eagerly adopted the replacements about as soon as they existed.

There were no good options previously. It was either C or C++. Most of the other languages were either fringe or had a GC, or had a pseudo runtime GC (Swift). The culture of Java and C# and Go didn't really support the type of low level optimizations needed, even though you could technically do system programming if you restrict yourself to a specific subset of language and cut yourself off from most of the standard library and ecosystem. Nim was unstable. OCaml had the same issues as Go and Java and C#. You simply did not have any options until Rust came along. Oberon was an academic trinket. The less said about the various lisps and forths the better.

OS and embedded programming require bare metal support and data structures that can run standalone in the absence of an OS and standard library, and the ecosystem must exist to support such a style of programming.

Currently Rust has over 10000 crates that would theoretically work just fine in an kernel environment.

https://crates.io/categories/no-std


As a user I actually like Gatekeeper. 95% of the time it's not a problem. the other 5% of the time I have to click a button in my settings to allow unsigned code. But at least it gives me pause to think about the source and if I really trust it (which is mostly offloaded to Apple the other 95% of the time).

Free business idea: get an Apple developer account and then agree to sign code for other people in exchange for a small piece of their income. I'm surprised that doesn't exist yet (or does it?).


If that isn't already a violation of the developer account ToS, it would be in short order. The dialog is about keeping normal non-technical users (Apple's primary market) from straying away from the App Store where they can collect 30% and analytics. They're not protecting you, they're herding you.

The risk is that eventually you sign someone's malware and all of your customers have the certs that signed their apps revoked.

The BBCs style guide amuses me. The princes must be referred to by their titles, as well as Sir David and anyone else with a knighthood.

I understand it's pretty common in the UK, but as an American it's funny to see.


Conversely, I wince everytime I'm called 'sir' in an American restaurant or shop.

Oh, I never thought about that. In the UK, is it improper to call someone Sir who isn't a Knight?

Pretty much.

Used outside that frame it’s an insult as in:

You, sir, are a cad and a bounder.


You can't register a ch domain with fewer than 3 characters. It's showing as available because that thing that checks available only looks if it's registered, not if it's allowed.

Technically they’re used for the Cantons so to register a two letter CH subdomain you first need to register a new Swiss Canton!

We were doing multi-AZ and multi-region failover at Netflix all the way back in 2011:

https://netflixtechblog.com/the-netflix-simian-army-16e57fba...


Most of the other regions are fairly stable. Ohio (us-east-2) is a great choice if you're just starting out. Not sure about ca-central-1, but I've never heard anything bad about it.

I have a lot of experience in this area (and some patent applications). For Alexa, the device established a connection back to the server and then kept that open, sending basically HTTP2/SPDY/Something like it over the wire after it detected the wake word. This allowed the STT start processing before you finish talking, so there is only a small delay in processing the last few chunks of your utterance.

The answer came back over the same connection.

In the case of OpenAI, they can't exactly keep a persistent connection open like Alexa does, but they can use HTTP2 from the phone and both iOS and Android will pretty much take care of that connection magically.

The author is absolutely right, a real time protocol isn't necessary. It's more important to get all the data. The user won't even notice a delay until you get over 500ms. Especially in the age of mobile phones, where most people are used to their real time human to human communications to have a delay.

(If you work at OpenAI or Anthropic, give me a shout, I'm happy to get into more details with you)


"The author is absolutely right, a real time protocol isn't necessary. It's more important to get all the data. The user won't even notice a delay until you get over 500ms"

Not my experience, running around 6,000 conversations per day with voice, with webrtc + cascading (stt/llm/tts) architecture.

Maybe I misunderstood your comment, but that 500ms is basically the floor of a stat of the art voice implementation these days - if you are lucky and don't skimp, and do various expensive things like speculative decoding and reasoning. 450ms on the LLM pass alone. Every ms counts in commercial applications of voice ai. If you add 200ms or 300ms to that, it really degrades the conversation.

We do a lot of voice stuff to support our business, largely with unsophisticated, non technical users. Last year's attempts, with measured turn to turn latencies of around 1200ms-1500ms, led to a lot of user confusion, interruptions, abandoned conversations and generally very unpleasant experiences. We are at around 700ms turn to turn now, depending on tool usage needed, and its approaching an OK experience, rivalling an interaction with an actual human. We are spending quite a lot to shave another 100ms off that. We do expensive, wasteful things such as speculative LLM passes, we do speculative tool executions (do a few LLM inferences as the user speaks, but don't actually execute non-idempotent tool calls before you know that that LLM pass is usable and the user did not say anything important at the tail end of their sentence) just to shave 100-200ms. When someone says 500ms is irrelevant I am sure they are describing some other use case, not human-to-AI voice interactions.

In my experience with voice AI, the problem is not with some occasional dropped webrtc packets. The real hard problem is with strong background noises, echo, and of course accents. WebRTC with its polished AEC implementations helps quite a lot at least with echos. I get the protocol is a major PITA to implement at OpenAI scale, but for anything but hyperscale applications there is lots of good, viable solutions and commercial providers (say, Daily for instance) that make it a no problem. The real problems to solve are still elswhere. But boy, add 500ms to my latency budget and you've killed my application.


I agree with everything you've said, I must have written it wrong.

What I was saying is the same as you -- the user will tolerate a total delay of 500ms, and then happiness starts to fall off. We had some Alexa utterances at 500ms, the most basic ones, but most took longer.

However, even with http2 and the like, we could get in that range because of the fact that it was sending data right away, so we were mostly done processing the STT by the time they were done speaking, and we were already working on the answer based on the first part of the utterance.

But I would need to see some really strong evidence to even think about using WebRTC.


Sorry, I misunderstood your comment.

As for webrtc - it was mainly for decent support in browsers and built in AEC. I think we will take another look at this design choice if we run out of ways to further optimize.


I am myself working on something similar, but i have noticed that if I try to pass on early speech from the user to the LLM to reduce latency, chances of interruptions get even higher. For example, the user may say something like “Yes” followed by a brief pause, leading the speech model to count that as a complete turn, triggering the LLM call. But then the user may add something more, so i have to cancel the previous request so that any irreversible state transitions can be avoided. Now due to the lower latency (due to speculative calls), I get an even smaller window to actually cancel the response or even to stop the model from speaking.

Detecting end of turn is a whole other issue. You can do the easy thing, which is just assign some number of milliseconds of silence as the end, or you can spend a lot of money asking the model to figure it out based on context.

Humans actually do the second thing, where we not only use our "model" to figure out end of turn, we actually predict what they are going to say based on context and will sometimes answer before they even finish.


This is pretty insightful thank you. Which provider are you guys using? Is it also over the phone or fully web/app based. Do you have any resources you can point me to learn about this?

We use a bunch, at the moment we mainly self host (and use pipecat) use Daily, and a few niche boutique suppliers who built things for us.

There is a great resource for learning this stuff - the CEO of Daily, Kwindla Kramer, hosted a series of 1hr sessions on low latency voice ai. Here:

https://youtube.com/playlist?list=PLzU2zoMTQIHjMPZ-OnpC3ozZs...

Some of this is a bit outdated but most of it is very valuable.

Kwindla posts a lot of extremely useful stuff on x and linkedin, incl. working, easily replicable sub 500ms setups.


Beautiful thanks. We are also looking at this and another complication is transcripts can get pretty messy updates, corrections etc.

> The user won't even notice a delay until you get over 500ms

I think a lot of comments are getting so laser focused on the transport delays that they’re forgetting that the LLM pipeline isn’t instant.

The transport delays are additive on top of all of the other delays, which are already high.

Which I assume is why they reached for the lowest latency solution they could, because they need every bit of help they can get to start shrinking that end to end delay across the entire pipeline.

Analogies to human voice delay don’t work because in that case we treat the human as having no delay.


And that was the entire point of my comment. That your transport layer isn't your bottleneck. You can start processing before they finish speaking. Your bottleneck will always be what happens after that.

This makes me sad. My son was just about to be old enough to build his first PC, and was showing interest. I guess I'm going to have match his savings 1:1 to make it possible now.

There are so many people like you (who only use it for gas) that they are now building Costco gas stations not attached to a warehouse. Massive stations like the ones near the warehouses.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: