If A vibes, and B is overwhelmed with noise, how does B reliably go through it? If using AI, this necessarily faces the same problems that recording all A's actions was trying to solve in the first place, and we'd be stuck in a never-ending cycle.
We could also distribute the task to B, C, D, ... N actors, and assume that each of them would "cover" (i.e. understand) some part of A's output. But this suddenly becomes very labor intensive for other reasons, such as coordination and trust that all the reviewers cover adequately within the given time...
Or we could tell A that this is not a vibe playground and fire them.
Yes, even ferries in Scandinavia can be roads when it makes for a better map… Or not ferries in the Mediterranean when it doesn’t.
Honestly I think this part
> The resulting images bring insights into the ways in which road infrastructure reflect regional, political and geographical situations.
should just be taken as pseudo-profound art fluff. Are you telling me all Greek commercial transport goes straight through non-Schengen instead of just ferrying across?
Yes, that seems to hold for rocks. But that doesn’t shut down the original post’s premise, unless you hold the answer to what can and cannot be banged together to create emergent intelligence.
Extraordinary assumptions (i.e., AI is conscious) require extraordinary proof. If I told you my bag of words was sentient, I assume you'd need more proof than just my word.
The fact that LLMs talk intelligently is the extraordinary proof. It would be difficult for me to prove that you're not an LLM, or for you to prove that I'm not an LLM. That's how good they are.
Talking intelligently about arbitrary subjects is an incredibly high bar. Eliza could not do anything even remotely approaching that, and this would have been considered black magic by the general public prior to 2022.
It's incredible how quickly the bar has been raised here.
I think it's meant to be taken more in the abstract. Yes, LLM can refuse your request, and yes you can ask it to prepend "have you checked that it already exists?", but it can't directly challenge your super long-range assumptions the same way as another person saying at the standup "this unrelated feature already does something similar, so maybe you can modify it to accomplish both the original goal and your goal", or they might say "we have this feature coming up, which will solve your goal". Without proper alignment you're just churning out duplicate code at a faster rate now.
It’s interesting then to ask if this will behave the same as big orgs? Eg once your org is big and settled, anything but the core product and adjacent services become impossible, which is why 23 often see a 50-person company out-innovating a 5k person company in tech (only to be bought up and dismantled, of course, but that’s besides this point).
Will agents simply dig the trenches deeper towards the direction of the best existing tests, and does it take a human to turn off the agent noise and write code manually for a new, innovative direction?
But the code you’re writing is guard railed by your oversight, the tests you decide on and the type checking.
So whether you’re writing the spec code out by hand or ask an LLM to do it is besides the point if the code is considered a means to an end, which is what the post above yours was getting at.
Tests and type checking are often highway-wide guardrails when the path you want to take is like a tightrope.
Also the code is not a means to an end. It’s going to be run somewhere doing stuff someone wants to do reliably and precisely. The overall goal was ever to invest some programmer time and salary in order to free more time for others. Not for everyone to start babysitting stuff.
Maybe I was stupid or maybe it just doesn’t hit the same way if you don’t grow up in the US, but I remember not being terribly fascinated by it as a 90s kid. In fact, I found it kind of uncanny that the world felt so… disconnected. I later learned this was called “modernist architecture”.
I still get made-up Python types all the time with Gemini. Really quite distracting when your codebase is massive and triggers a type error, and Gemini says
"To solve it you just need to use WrongType[ThisCannotBeUsedHere[Object]]"
and then I spend 15 minutes running in circles, because everything from there on is just a downward spiral, until I shut off the AI noise and just read the docs.
Gemini unfortunately sucks at calling tools, including ‘read the docs’ tool… it’s a great model otherwise. I’m sure Hassabis’ team is on it since it’s how the model can ground itself in non-coding contexts, too.
We could also distribute the task to B, C, D, ... N actors, and assume that each of them would "cover" (i.e. understand) some part of A's output. But this suddenly becomes very labor intensive for other reasons, such as coordination and trust that all the reviewers cover adequately within the given time...
Or we could tell A that this is not a vibe playground and fire them.
reply