HN2new | past | comments | ask | show | jobs | submit | tudelo's commentslogin

The only counter I have to this is that there are some workflows that have test environments, everything can't or shouldn't just run locally. Sometimes these test take time, and instead of babysitting the model to write code and run the build+deploy+test manually, you can send it off to work until the kinks are worked out.

Add to that I have worked on many projects that take more than 20 minutes to fully build and run tests... unfortunately. And I would consider that part of the job of implementing a feature, and to reduce cycles I have to take.

After the "green" signal I will manually review or send off some secondary reviews in other models. Is it wasteful? Probably. But its pretty damn fun (as long as I ignore the elephant in the room.)


Yeah, our basic integration test suite takes over 20 minutes to run in CI, likely higher locally but I never try to run the full test suite locally. That doesn't even encapsulate PDVs and other continuous testing that runs in the background.

The other day, I wrote a claude skill to pull logs for failing tests on a PR from CI as a CSV for feeding back into claude for troubleshooting. It helped with some debugging but was very fraught and needed human guidance to avoid going in strange directions. I could see this "fix the tests" workflow instrumented as overnight churn loops that are forbidden from modifying test files that run and have engineers review in the morning if more tests pass.

Maybe agentic TDD is the future. I have a bit of a nightmare vision of SWEs becoming more like QA in the future, but with much more automation. More engineering positions may become adversarial QA for LLM output. Figure out how to break LLM output before it goes to prod. Prove the vibe coded apps don't scale.

In the exercise I described above, I was just prompt churning between meetings (having claude record its work and feeding it to the next prompt, pulling test logs in between attempts), without much time to analyze, while another engineer on my team was analyzing and actually manually troubleshooting the vibe coded junk I was pushing up, but we fixed over 100 failing integration tests in a week for a major refactor using claude plus some human(s) in the loop. I do believe it got things done faster than we would have finished without AI. I do think the quality is slightly lower than would have been if we'd had 4 weeks without meetings to build the thing, but the tests do now pass.


Yes that's fair, but not the case for me. Everything can run locally and specs run quickly for covering things claude changes. For everything else, the GitHub CI run is 10-15m and catches any outlier failures, and I'm usually working on more than one thing at a time anyway so it doesn't really matter to wait for this.

The ping pong video you linked is clearly fake. Look at the paddle... anyways...


First off, appreciate you sharing your perspective. I just have a few questions.

> I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months.

Can you expand more on what you mean by that? I'm a bit of a noob on llm enabled dev work. Do you mean that you will kick off new sessions and provide a context that you manage yourself instead of relying on a longer running session to keep relevant information?

> Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.

I appreciate your insight but I'm failing to understand how exactly knowing these tools increases performance of llms. Is it because you can more precisely direct them via prompts?


LLMs work on text and nothing else. There isn't any magic there. Just a limited context window on which the model will keep predicting the next token until it decides that it's predicted enough and stop.

All the tooling is there to manage that context for you. It works, to a degree, then stops working. Your intuition is there to decide when it stops working. This intuition gets outdated with each new release of the frontier model and changes in the tooling.

The stateless API with a human deciding what to feed it is much more efficient in both cost and time as long as you're only running a single agent. I've yet to see anyone use multiple agents to generate code successfully (but I have used agent swarms for unstructured knowledge retrieval).

The Unix tools are there for you to progra-manually search and edit the code base copy/paste into the context that you will send. Outside of Emacs (and possibly vim) with the ability to have dozens of ephemeral buffers open to modify their output I don't imagine they will be very useful.

Or to quote the SICP lectures: The magic is that there is no magic.


I can't speak for parent, but I use gptel, and it sounds like they do as well. It has a number of features, but primarily it just gives you a chat buffer you can freely edit at any time. That gives you 100% control over the context, you just quickly remove the parts of the conversation where the LLM went off the rails and keep it clean. You can replace or compress the context so far any way you like.

While I also use LLMs in other ways, this is my core workflow. I quickly get frustrated when I can't _quickly_ modify the context.

If you have some mastery over your editor, you can just run commands and post relevant output and make suggested changes to get an agent like experience, at a speed not too different from having the agent call tools. But you retain 100% control over the context, and use a tiny fraction of the tokens OpenCode and other agents systems would use.

It's not the only or best way to use LLMs, but I find it incredibly powerful, and it certainly has it's place.

A very nice positive effect I noticed personally is that as opposed to using agents, I actually retain an understanding of the code automatically, I don't have to go in and review the work, I review and adjust on the fly.


One thing to keep in mind is that the core of an LLM is basically a (non-deterministic) stateless function that takes text as input, and gives text as output.

The chat and session interfaces obscure this, making it look more stateful than it is. But they mainly just send the whole chat so far back to the LLM to get the next response. That's why the context window grows as a chat/session continues. It's also why the answers tend to get worse with longer context windows – you're giving the LLM a lot more to sift through.

You can manage the context window manually instead. You'll potentially lose some efficiencies from prompt caching, but you can also keep your requests much smaller and more relevant, likely spending fewer tokens.


I am rather positive that if you were sat down in a room and couldn't leave unless you did some mildly complicated long division, you would succeed. Just because it isn't a natural thing anymore and you have not done the drills in decades doesn't mean the knowledge is completely lost.


If you are concerned that embedding "from first-principles" reasoning in widely-available LLM's may create future generations that cannot, then I share your concern. I also think it may be overrated. Plenty of people "do division" without quite understanding how it all works (unfortunately).

And plenty of people will still come along who love to code despite AI's excelling at it. In fact, calling out the AI on bad design or errors seems to be the new "code golf".


Probably won't find it remote. I would say gov contractors / gov jobs could be chill.. not sure how visa would interact with that process, sorry.


I don't know what it is but in basically every major airport I have struggled to get an uber/lyft. I expect at minimum one cancellation...


In many cities this is solved with the "Uber rank" system, where you simply get in the first car in line, give the driver a code, and then it loads up your journey. Fast and avoids any hassles with drivers rejecting your destination.


Oh they reinvented taxi stands. The code for a pre programmed destination in the app is actually a nice touch.


Wait, shit, that's amazing. How did they do that? I mean, not how did they write the code to match when given the code (obviously the driver should scan the rider's QR code), but how did Uber get laws changed to allow them to do this obvious reimplementation of a taxi stand when it's technically illegal under taxi laws.


Trying to find a specific ride hail driver at the airport seems like a huge waste of time. Just go to the taxi stand.


True, but then you'll be in a taxi.


Same. I assume it depends on the destination

Person wants to go somewhat far from airport? That's more time on this single ride and less time pocketing peak demand money


Alerting has to be a constant iterative process. Some things should be nice to know, and some things should be "halt what you are doing and investigate". The latter needs to really be decided based on what your SLI/SLAs have been defined as, and need to be high quality indicators. Whenever one of the halt-and-do things alerts start to be less high signal they should be downgraded or thresholds should be increased. Like I said, an iterative process. When you are talking about a system owned by a team there should be some occasional semi-formal review of current alerting practices and when someone is on-call and notices flaky/bad alerting they should spend time tweaking/fixing so the next person doesn't have the same churn.

There isn't a simple way but having some tooling to go from alert -> relevant dashboards -> remediation steps can help cut down on the process... it takes a lot of time investment to make these things work in a way that allows you to save time and not spend more time solving issues. FWIW I think developers need to be deeply involved in this process and basically own it. Static thresholds usually would just be a warning to look at later, you want more service level indicators. For example if you have a streaming system you probably want to know if one of your consumers are stuck or behind by a certain amount, and also if there is any measurable data loss. If you have automated pushes, you would probably want alerting for a push that is x amount of time stale. For rpc type systems you would want some recurrent health checks that might warn on cpu/etc but put higher severity alerting on whether or not responses are correct and as expected or not happening at all.

As a solo dev it might be easier just to do the troubleshooting process every time, but as a team grows it becomes a huge time sink and troubleshooting production issues is stressful, so the goal is to make it as easy as possible. Especially if downtime == $$.

I don't have good recommendations for tooling because I have used mostly internal tools but generally this is my experience.


This is an incredibly insightful and helpful comment, thank you. You explain exactly what I thought when writing this post. The phrase that stands out to me is "constant iterative process." It feels like most tools are built to just "fire alerts," but not to facilitate that crucial, human-in-the-loop review and tweaking process you described. A quick follow-up question if you don't mind: do you feel like that "iterative process" of reviewing and tweaking alerts is well-supported by your current tools, or is it a manual, high-effort process that relies entirely on team discipline? (This is the exact problem space I'm exploring. If you're ever open to a brief chat, my DMs are open. No pressure at all, your comment has already been immensely helpful, thanks.)


Jesus. I lived in NYC and white truffles at the right time were no where near as expensive. I can't believe they felt worth it.


https://www.cnbc.com/amp/2018/11/26/why-truffles-are-so-expe... : Italian white truffles … $85,000 for 2 lbs.. two years ago when there was a short supply.

Seems like it’s around 10,000 euros/lb now. When I bought it..it was probably 2200-2500 euros per pound. That was almost a decade ago!

To me, after the Italian white truffle experience, black truffles are not worth it anymore. I just want to preserve that one special memory(of many meals) untainted!


Quitting nicotine is absurdly overblown. I'm sure me saying this will annoy somebody, but I think it's much more of a habit than a physical addiction and people hide behind the physical addiction as an excuse. Maybe if you smoke a pack a day it is a different beast but daily smoking is really not hard to kick.


Maybe it is overblown, I can generally go a while without a smoke, and could probably quit with relative ease if I wanted to. Surely it’s no heroin withdraw and certainly not as bad as alcohol withdraw, but it’s the quickest I’ve ever come to developing any physical addiction symptoms. The withdraw is real too, but mostly bearable.


well it took me ~four years to comfortably quit nicotine from wanting to, to absolute zero. Weed is considerably easier to turn on and off whenever. I get that we're all weird but for some people there is a significant difference here.


I wonder how much of that is addiction and how much is just forgetting what something is like. I have 1 cigar a month (never more, occasionally less) and have done so for about 12 years. I suppose some may say that means I'm addicted. Or does it mean I simply enjoy a cigar? I honestly don't know the threshold. When I see a cigar I'm usually reminded of the enjoyment; I guess that is some addiction in and of itself. It calls into question, what is a memory in and of itself -- and if memories and the desire to relive them are just an incarnation of addiction.


IMHO addiction is where you don't feel like you can adequately function without. For a daily nicotine smoker, being without is a negative consequence that seriously alters mood and efficacy. This is what makes that sort of addiction particularly ravenous as opposed to your measured once a month.


From my experience the only "FAANG" that has tried to get me to do a hackerrank was the one you listed in your last sentence so you might have hit the nail on the head.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: