One of the issues being explored is that although US radar is aged, surface vehicles can be equipped with ASDE-X transponders to be more visible to ATC systems. https://www.faa.gov/air_traffic/technology/asde-x
The vehicle that crashed into the plane did not have one and thus no automated alert was triggered.
LLMs have access to the same tools --- they run on a computer.
The problem here is the basic implementation of LLMs. It is non-deterministic (i.e. probabilistic) which makes it inherently inadequate and unreliable for *a lot* of what people have come to expect from a computer.
You can try to obscure the problem but you can't totally eliminate it without redesigning LLMs. At this time, the only real cure is to verify everything ---which nullifies a lot of the incentive to use LLMs in the first place.
> LLMs have access to the same tools --- they run on a computer.
That doesn't give them access to anything. Tool access is provided either by the harness that runs the model or by downstream software, if it is provided at all, either to specific tools or to common standard interfaces like MCP that allow the user to provide tool definitions for tools external to the harness. Otherwise LLMs have no tools at all.
> The problem here is the basic implementation of LLMs. It is non-deterministic (i.e. probabilistic) which makes it inherently inadequate and unreliable for a lot of what people have come to expect from a computer.
LLMs, run with the usual software, are deterministic [ignoring hardware errors ans cosmic ray bit flips, which if considered make all software non-deterministic] (having only pseudorandomness if non-zero temperature is used) but hard to predict, though because implementations can allow interference from separate queries processed in a batch, and the end user doesn't know what other typical hosted models are non-deterministic when considered from the perspective of the known input being only what is sent by one user.
But your problem is probably actually that the result of untested combinations of configuration and input are not analytically predictable because of complexity, not that they are non-deterministic.
LLMs absolutely do not have access to the same tools unless they're explicitly given access to them. Running on a computer means nothing.
It sounds like you don't like LLMs! In that case, you may be more interested in our REST Api. All the same functions, but designed for edge computing, where dependency bloat is a real issue https://tinyfn.io/edge
They're building a moat with data. They're building their own datasets of trusted sources, using their own teams of physicians and researchers. They've got hundreds of thousands of physicians asking millions of questions everyday. None of the labs have this sort of data coming in or this sort of focus on such a valuable niche
> They're building their own datasets of trusted sources, using their own teams of physicians and researchers.
Oh so they are not just helping in search but also in curating data.
> They've got hundreds of thousands of physicians asking millions of questions everyday. None of the labs have this sort of data coming in or this sort of focus on such a valuable niche
I don't take this too seriously because lots of physicians use ChatGPT already.
saying they aren't pioneering is very different than saying they aren't a major player in the space. There're only like 5-7 players with a foundational model that they can serve at scale. xAI is one of them
This is an interesting read, and while I support being nice to every_thing_ in principle. Most of the research into this actually shows that being mean yeilds better results
I've read the blurb from previous years about doing one-shots with threats of death, or etc - but I've never seen that for long many prompt sessions.
I wonder - if you hired a programmer for a day, trapped them in a cage, and then threatened them, maybe it would be more productive for a while. I mean, if I were writing that book, I could see how they would do great work for a bit.
This looks pretty cool. I keep seeing people (an am myself) using claude code for more an more _non-dev_ work. Managing different aspects of life, work, etc. Anthropic has built the best harness right now. Building out the UI makes sense to get genpop adoption
Yeah, the harness quality matters a lot. We're seeing the same pattern at Gobii - started building browser-native agents and quickly realized most of the interesting workflows aren't "code this feature" but "navigate this nightmare enterprise SaaS and do the thing I actually need done." The gap between what devs use Claude Code for vs. what everyone else needs is mostly just the interface.
how do you think it works today? some guy with binoculars?
reply