Thats very true and I think about often because even in everyday task e.g. "We need a new feature to download reports" then you get a ticket but still depending how much that is used, desired, invested in or marketed and all those things, how flexible should I make, whats all the errors, whats the data, how secure and million other small or bigger decisions.
The point isn't how user stories are written, but realizing that everything is a compromise for practicality since if I wanted it to be the most secure thing ever, Id say "Lets just not offer that feature at all".
But now back to cars, I like that I heard some companies promised or started to build cars again with real buttons. In our world its when I try to use more and support the tools which aren't build on Electron (and slow as hell) but actually care about performance. Open source and decent enough security is already the minimum requirement.
I dunno if Steve was autistic or no but there's a subtype of people who develop a "special interest" in reading, influencing and manipulating people and get quite good at it. They study people carefully - each encounter is a new way to find out how people tick, their motivations and levers. They tend to have unusually good memories of specific encounters and conversations. They can quickly and reliably size people up. You need many many encounters to build this kind of intuition up, so there are specific professions they cluster in that give them lots of practice, cops / detectives, sales, even retail - but you need lots of (ideally non-trivial) interactions to supply the intuition pump.
My main complaint with mcp is that it doesn't compose well with other tools or code. Like if I want to pull 1000 jira tickets and do some custom analysis I can do that with cli or api just fine, but not mcp.
Right, that feels like something you'd do with a script and some API calls.
MCP is more for a back and forth communication between agent and app/service, or for providing tool/API awareness during other tasks. Like MCP for Jira would let the AI know it can grab tickets from Jira when needed while working on other things.
I guess it's more like: the MCP isn't for us - it's for the agent to decide when to use.
I just find that e.g. cli tools scale naturally from tiny use cases (view 1 ticket) to big use cases (view 1000 tickets) and I don't have to have 2 ways of doing things.
Where I DO see MCPs getting actual use is when the auth story for something (looking at you slack, gmail, etc) is so gimped out that basically, regular people can't access data via CLI in any sane or reasonable way. You have to do an oauth dance involving app approvals that are specifically designed to create a walled garden of "blessed" integrations.
The MCP provider then helpfully pays the integration tax for you (how generous!) while ensuring you can't do inconvenient things like say, bulk exporting your own data.
As far as I can tell, that's the _actual_ sweet spot for MCPs. They're sort of a technology of control, providing you limited access to your own data, without letting you do arbitrary compute.
I understand this can be considered a feature if you're on the other side of the walled garden, or you're interested in certain kinds of enterprise control. As a programmer however I prefer working in open ecosystems where code isn't restricted because it's inconvenient to someone's business model.
The auth angle is pretty interesting here. I spend a fair amount of time helping nontechnical people set up AI workflows in Claude Cowork and MCP works pretty well for giving them an isolated external system that I can tightly control their workflow guardrails but also interestingly give them the freedom to treat what IS exposed as a generic api automation tool. That combined with skills lets these non technical people string together zapier like workflows in natural language which is absolutely huge for the level of agency and autonomy it awards these people. So I find it quite interesting for the use case of providing auth encapsulated API access to systems that would normally require an engineer to unlock. The story around “wrap this REST API into a controlled variant only for the end users use case and allow them to complete auth challenges in every which way” has been super useful. Some of my mcp servers go through an oauth challenge response, others provide them guidance to navigate to the system and generate an api key and paste it into the server on initial connection.
>while ensuring you can't do inconvenient things like say, bulk exporting your own data
I think this is the key; I want my analysts to be able to access 40% of the database they need to do their job, but not the other 60% parts that would allow them to dump the business-secrets part of the db, and start up business across the street. You can do this to some extent with roles etc but MCP in some ways is the data firewall as your last line of protection/auth.
Give the model a REPL and let it compose MCP calls either by using tool calls structured output, doing string processing or piping it to a fast cheap model to provide structured output.
This is the same as a CLI. Bash is nothing but a programming language and you can do the same approach by giving the model JavaScript and have it call MCP tools and compose them. If you do that you can even throw in composing it with CLis as well
You can make it compose by also giving the agent the necessary tools to do so.
I encountered a similar scenario using Atlassian MCP recently, where someone needed to analyse hundreds of Confluence child pages from the last couple of years which all used the same starter template - I gave the agent a tool to let it call any other tool in batch and expose the results for subsequent tools to use as inputs, rather than dumping it straight into the context (e.g. another tool which gives each page to a sub-agent with a structured output schema and a prompt with extraction instructions, or piping the results into a code execution tool).
It turned what would have been hundreds of individual tool calls filling the context with multiple MBs of raw confluence pages, into a couple of calls returning relevant low-hundreds of KBs of JSON the agent could work further with.
What it can do is call multiple MCPs, dumping tons of crap into the context and then separately run some analysis on that data.
Composable MCPs would require some sort of external sandbox in which the agent can write small bits of code to transform and filter the results from one MCP to the next.
This is confusing to me. What is composability if not calling a program, getting its program, and feeding it into another program as input? Why does it matter if that output is stored in the LLM's context, or if it's stored in a file, or if it's stored ephemerally?
Maybe I'm misunderstanding the definition of composability, but it sounds like your issue isn't that MCP isn't composable, but that it's wasteful because it adds data from interstitial steps to the context. But there are numerous ways to circumvent this.
For example, it wouldn't be hard to create a tool that just runs an LLM, so when the main LLM convo calls this tool it's effectively a subagent. This subagent can do work, call MCPs, store their responses in its context, and thereby feed that data as input into other MCPs/CLIs, and continue in this way until it's done with its work, then return its final result and disappear. The main LLM will only get the result and its context won't be polluted with intermediary steps.
It cannot do "anything" with the tools. Tools are very constrained in that the agent must insert into it's context the tool call, and it can only receive the response of the tool directly back into its context.
Tools themselves also cannot be composed in any SOTA models. Composition is not a feature the tool schema supports and they are not trained on it.
Models obviously understand the general concept of function composition, but we don't currently provide the environments in which this is actually possible out side of highly generic tools like Bash or sandboxed execution environments like https://agenttoolprotocol.com/
But in the context of this discussion, Atlassian has a CLI tool, acli. I'm not quite following why that wouldn't have worked here. As a normal CLI you have all the power you need over it, and the LLM could have used it to fetch all the relevant pages and save to disk, sample a couple to determine the regular format, and then write a script to extract out what they needed, right? Maybe I don't understand the use case you're describing.
Not all agents are running in your CLI or even in any CLI, which is why people are arguing past each other all over the topic of MCP.
I implemented this in an agent which runs in the browser (in our internal equivalent of ChatGPT or Claude's web UI), connecting directly to Atlassian MCP.
Hmm, but you can't write a standard MCP (e.g. batch_tool_call) that calls other MCPs because the protocol doesn't give you a way to know what other MCPs are loaded in the runtime with you or any means to call them? Or have I got that wrong?
So I guess you had to modify the agent harness to do this? or I guess you could use... mcp-cli ... ??
The DCs are in VA because its fed / spook central, which induces a large number of well paid policy / compliance / paperwork / tech jobs etc.
NoVa literally one of the richest regions on the planet and it's all sorta tied up in the same thing. Seems unfair to say DCs "bring nothing", the whole ecosystem is a manifestation of concentrated defense spend.
My political views are pretty simplistic: whoever initiates violence is the bad guy.
That's problematic here, because neither the US nor Iran are strangers to initiating violence. So a further refinement is necessary, based on a principle that I haven't had to fall back on very often: whoever initiates violence during negotiations is the bad guy.
What are those objectives? Can they be achieved by dropping bombs from the sky?
The US, and the rest of the world, are still waiting to know the answer to the first. Why was this war on a whim needed now? What was so imminent that the ongoing negotiations could not have continued?
The second answer is a definitive no, it's never been except for that one time that ended of WW2 when no one else had nukes.
Not really. I'm simply noting the characteristics of the vehicle as seen in the videos released by the War Department. Are you a fisherman? Usually when I go fishing I take, ya know, fishing gear with me.
Is joining a military that has constantly been engaged in terror a "political view"? I'm not the one killing innocent people, it's just basic humanity to feel justice when murderers are stopped. I'm neither a Democrat nor Republican, this is not a partisan issue. I would vote for any politician that said they were closing down our bases all over the world and sanctioning Israel.
We're not really calorie constrained anymore and most humans live in much denser environments than they used to. You would expect rate of exposure, the rate of mutation / change and the rate at which new pathogens appear to be higher than in the past.
Consequently, you wouldn't necessarily expect ancestral "defaults" to be optimal for modern environments.
> you wouldn't necessarily expect ancestral "defaults" to be optimal
I like the term ancestral defaults and indeed, we've come a long way since then and our biological and environmental reality is substantially different.
There is this book series Mortal Coil by Emily Suvada which imagines a future where technology has advanced enough to allow one to tweak their genome as easily as we use apps on our phone today. It was a fascinating read.
Not the OP, but evolution will often select for less energy expenditure as most animals are calorie constrained or at least during certain times of the year (c.f. animals that hibernate to survive low calorie availability).
It seems plausible to me that the immune system might be calorie intensive to be on full alert all the time. However, I suspect that having the immune system be more active will likely lead to other complications such as autoimmune disorders or even something as common as hayfever.
An always full alert immune system won't have any lee way to address any onset of an infection, and also considering that pathogens also evolve to by pass the increased alertness, this will probably be catastrophic for the species.
That is unless of course like bats, this was the result of evolution and natural selection. But bolting on like this vaccine do it, yea, going to be pretty bad.
I don't really see why most existing equipment would be usable in this way. When you automate a thing you often have to rethink the entire problem. But more generally, automation is for _repeatable_ things and a lot of research is... not that.
The expensive equipment is usually a small (but crucial!) part of research activity, which involves things like talking to a lot of people, getting permission to do weird or new things, going out into the environment and collecting things in very specific ways, storing and transporting them carefully, observing, etc. Building or modifying existing lab instruments, doing various things with animals that are not co-operative ... and CLEANING. Who does all the cleaning?
Definitely use cases when you have a specific protocol you want to scale, but I'm also not sure how safe I would feel around AI with a license to experiment and access to dangerous reagents, high temperatures, etc. Or, god help us, an oligonucleotide synthesizer. Which is definitely going to happen (if it has not already).
In some cases that would be the same person that does the most advanced innovative and/or creative work.
The idea behind the fully automated system is that fewer hired hands are needed for efforts that are routine enough. But not zero, you still need one person who can do everything at a minimum, if called upon for mission-critical operation.
In the case of the creative work and planning where it is out of the league for AI, these things need to always be done too, but they are not exactly "routine".
Once most of the tedious routine tasks are well-automated though, then the human brain behind the lab can finally relax a bit, with eurekas flowing at the same rate without needing a full 40 or 50 addititonal hours at the bench any more, while even more results are generated than they could do single-handedly too.
Which gives them the time to do the cleaning also, otherwise they would need two humans to serve their only automated system.
Great discussion on colour theory but I think it skips over the fact that chemistry and economics were also real constraints. Paints that don't fade easily, chemically inert, durable etc and can be produced in enormous volume are not that common. Practical stuff like, how easy does it catch fire, do solvents degrade it. Most important: is it CHEAP?
Also by 1944 there would have been a ready made supply chain due to demand from the navy, which would have picked it for similar reasons and consumed it in enormous volume.
I think, practically, control rooms are chrome oxide green (you get to add as much titanium as you like - thats cheap as dirt too EDIT: it would have been lead actually in the 40s) for much the same reason that barns are red.
Think for a minute about why they worked in shacks.
Also worth noting that Birren was a paid consultant for DuPont - the company that made the paint. From that perspective alone you would kinda expect he is gonna pick colours they already produce reliably at scale.
He also consulted for army, coast guard and navy so whatever colours you pick for hazard marking in an industrial setting have to be _vaguely_ congruent or you're going to cause accidents.
I'm super confused. The small model "cost field" `rag-api/geometric_lens/cost_field.py` was trained on PASS_TASKS like "Write a function that counts vowels in a string." and FAIL_TASKS like "Write a function that converts a regular expression string to an NFA using Thompson's construction, then converts the NFA to a DFA.".
So it seems like it's a difficulty classifier for task descriptions written in English.
This is then used to score embeddings of Python code, which is a completely different distribution.
Presumably it's going to look at a simple solution, figure out it lands kinda close to simple problems in embedding space and pass it.
But none of this helps you solve harder problems, or distinguish between a simple solution which is wrong, and a more complex solution which is correct.
I think the goal is to have a light heuristic that helps find plausibly useful solutions. They're still going to go through a testing phase as a next step, so this is just a very simple filter to decide what's even worth testing.
> But none of this helps you solve harder problems, or distinguish between a simple solution which is wrong, and a more complex solution which is correct.
It does because hallucinations and low confidence share characteristics in the embedding vector which the small neural learns to recognize. And the fact that it continuously learns based on the feedback loop is pretty slick.
Having shell is extremely handy for further discovery. SO handy that if they were just gonna patch the bug and lock you out, you would simply not disclose it.
This is what happened. Tesla security received tons of bug reports that required root access to identify, yet they got a vanishingly small number of root vulnerability reports. This policy fixes that misincentive.
You could make a car that's safer than others at 10x the price but what would the demand look like at that price?
Would you pay 2x for your favourite software and forego some of the more complex features to get a version with half the security issues?
reply