HN2new | past | comments | ask | show | jobs | submit | kloud's commentslogin

Exactly this, existing code review tools became insufficient with the increase of volume of code, I would like to see more innovation here.

One idea that comes to mind to make review easier would be to re-create commits following Kent Beck's SB Changes concept - splitting structure changes (tidying/refactoring) and behavior changes (features). The structure changes could then be quickly skimmed (especially with good coverage) and it should save focus for review of the behavior changes.

The challenge is that it is not the same as just committing the hunks in different order. But maybe a skill with basic agent loop could work with capabilities of models nowadays.


I experimented with a command for atomic commits a while ago. It explicitly instructed the agent to review the diff and group related changes to produce a commit history where every HEAD state would work correctly. I tried to get it to use `git add -p`, but it never seemed to follow those instructions. Might be time for another go at this with a skill.

I have had success with having the skill create a new branch and moving pieces of code there, testing them after the move, then adding it.

So commit locally and have it recreate the commit as a sequence on another branch.


The OpenClaw/pi-agent situation seems similar to ollama/llama-cpp, where the former gets all the hype, while the latter is actually the more impressive part.

This is great work, I am looking forward how it evolves in the future. So far Claude Code seems best despite its bugs given the generous subscription, but when the market corrects and the prices will get closer to API prices, then probably the pay-per-token premium with optimized experience will be a better deal than to suffer Claude Code glitches and paper cuts.

The realization is that at the end agent framework kit that is customizable and can be recursively improved by agents is going to be better than a rigid proprietary client app.


> but when the market corrects and the prices will get closer to API prices

I think it’s more likely that the API prices will decrease over time and the CC allowances will only become more generous. We’ve been hearing predictions about LLM price increases for years but I think the unit economics of inference (excluding training) are much better than a lot of people think and there is no shortage of funding for R&D.

I also wouldn’t bet on Claude Code staying the same as it is right now with little glitches. All of the tools are going to improve over time. In my experience the competing tools aren’t bug free either but they get a pass due to underdog status. All of the tools are improving and will continue to do so.


> I think it’s more likely that the API prices will decrease over time and the CC allowances will only become more generous.

I think this is absolutely true. There will likely be caps to stop the people running Ralph loops/GasTown with 20 clients 24/7, but for general use they will probably start to drop the API prices rather than vice-versa.

> We’ve been hearing predictions about LLM price increases for years but I think the unit economics of inference (excluding training) are much better than a lot of people think

Inference is generally accepted to be a very profitable business (outside the HN bubble!).

Claude Code subscriptions are more complicated of course but I think they probably follow the general pattern of most subscription software - lots of people who hardly use it, and a few who push it very hard can they lose money on. Capping the usage solves the "losing money" problem.


FWIW, you can use subscriptions with pi. OpenAI has blessed pi allowing users to use their GPT subscriptions. Same holds for other providers, except Flicker Company.

And I'm personally very happy that Peter's project gets all the hype. The pi repo already gets enough vibesloped PRs from openclaw users as is, and its still only 1/100th of what the openclaw repository has to suffer through.


Good to know, that makes it even better. I still find Opus 4.5 to be the best model currently. But if next generation of GPT/Gemini close the gap that will cross the inflection point for me and make 3rd party harnesses viable. Or if they jump ahead, that should put more pressure on the Flicker Company to fix the flicker or relax the subscriptions.

Is this something that OpenAI explicitly approves per project? I have had a hard time understanding what their exact position is.


This is basically identical to the ChatGPT/GPT-3 situation ;) You know OpenAI themselves keep saying "we still don't understand why ChatGPT is so popular... GPT was already available via API for years!"

ChatGPT is quite different from GPT. Using GPT directly to have a nice dialogue simply doesn't work for most intents and purposes. Making it usable for a broad audience took quite some effort, including RLHF, which was not a trivial extension.

This is the first I'm hearing of this pi-agent thing and HOW DO PEOPLE TECH DECIDE TO NAME THINGS?

Seriously. Is creator not aware that "pi" absolutely invokes the name of another very important thing? sigh.


The creator is very aware. Its original name was "shitty coding agent".

https://shittycodingagent.ai/


then do SCA and backronym it into something acceptable! That's even better lore :)

There's a fair chunk of irony here in that Mario is being both anti-memetic with his naming choices and contrarian in his design decisions, and yet he still finds himself dunked in the muck of popularity as the backbone of OpenClaw.

You mean Software Component Architecture? Do you want to bring down the wrath of IBM!

Good call, he'll have to name it Shitty COdingagent, or "SCO". No one will sue over that name.

ding is a good name for an agent

Developers are the worst at naming things. This is a well known fact.

From the article: "So what's an old guy yelling at Claudes going to do? He's going to write his own coding agent harness and give it a name that's entirely un-Google-able, so there will never be any users. Which means there will also never be any issues on the GitHub issue tracker. How hard can it be?"

And like ollama it will no doubt start to get enshittified.

Only if it enters YC (like Ollama).

This is awesome! I was thinking it would be neat to have something like abduco but on a more reliable foundation, like libghostty-vt.

For my agent management scripts I use zellij since it is more ergonomic than tmux. Abduco sounded good in principle, but implementation is too limited. However, zellij is quite huge in the order of tens of thousands LOC and I am using only small part of it. It looks like zmx might implement just the right amount of features for this use case, I am going to try it. It is always nice to achieve same functionality with leaner tools.

Do you also think about dvtm part alternative? I wonder if once libghostty proper gets finished it would open possibility to level up textual multiplexing and unlock some cool features with graphical UIs.


I have thought about writing a separate tool that resembles dvtm but I’m not exactly sure how I would build it.

I don’t want to maintain a monster project like terminal multiplexing. Zmx is basically a single file with 1500 LoC and is “production grade” with just a few quirks I haven’t figure out yet.

I would want something of similar scope.

With zmx I created two commands you might be interested in: zmx run and zmx history. Run lets your execute commands inside the PTY and history lets you read from the session history.


Great thought provoking article! Indeed, typing commands on the command line feels primitive like typing code into interactive interpreters (python, irb, etc.). Those are primitive REPLs.

With lisp REPLs one types in the IDE/editor having full highlighting, completions and code intelligence. Then code is sent to REPL process for evaluation. For example Clojure has great REPL tooling.

A variation of REPL is the REBL (Read-Eval-Browse Loop) concept, where instead of the output being simply printed as text, it is treated as values that can be visualized and browsed using graphical viewers.

Existing editors can already cover the runbooks use case pretty well. Those can be just markdown files with key bindings to send code blocks to shell process for evaluation. It works great with instructions in markdown READMEs.

The main missing feature editor-centric command like workflow I can imagine is the history search. It could be interesting to see if it would be enough to add shell history as a completion source. Or perhaps have shell LSP server to provide history and other completions that could work across editors?


> It could be interesting to see if it would be enough to add shell history as a completion source.

Atuin runbooks (mentioned in the article) do this! Pretty much anywhere we allow users to start typing a shell command we feed shell history into the editor


> It could be interesting to see if it would be enough to add shell history as a completion source.

Fish shell does this too


Also in the context of LLMs I think model weights themselves could be considered an untrusted input, because who knows what was in the training dataset. Even an innocent looking prompt could potentially trigger a harmful outcome.

In that regard it reminds me of the CAP theorem, which also has three parts. However, in practice partitioning in distributed systems is given, so the choice is just between availability or consistency.

So in the case of lethal trifecta it is either private data or external communication, but the leg between these two will always have some risk.


That linear trend line does not seem to fit very well, I say we are looking at the beginning of a hockey stick :)

Stopped dual-booting for games and formatted the partition some time after Windows 7 EOL. Thank you Wine contributors, Valve and lord Gaben.


> It seems to me that LISP will probably be superseded for many purposes by a language that does to LISP what LISP does to machine language. Namely it will be a higher level language than LISP that, like LISP and machine language, can refer to its own programs. (However, a higher level language than LISP might have such a large declarative component that its texts may not correspond to programs. If what replaces the interpreter is smart enough, then the text written by a user will be more like a declarative description of the facts about a goal and the means available for attaining it than a program per se).

Pretty accurate foresight in 1980, in the "Mysteries and other Matters" section McCarthy predicting declarative textual description replacing lisp as a higher-level programming language, basically describing todays LLMs and agentic coding.


I don't see the connection to LLMs. With LLMs, you have a highly non-deterministic system that is also highly probable to be incorrect.

It seems like a stretch to say that's what McCarthy was thinking about regarding declarative facts and goals driving a program.


> Pretty accurate foresight in 1980, in the "Mysteries and other Matters" section McCarthy predicting declarative textual description replacing lisp as a higher-level programming language, basically describing todays LLMs and agentic coding.

To me, that sounds more like Prolog than agentic coding.


How many people are using LLMs to replace coding in Lisp? What code are these former Lispers producing with LLM Agents?

I understand what you're trying to say, but I don't think LLMs were created as some replacement for Lisp. I don't think they've replaced any programming language, but they do help quite a bit with autogeneration of Python & Javascript in particular.


I've been having a great time generating Common Lisp code with LLMs. eg. https://github.com/atgreen/cl-tuition , https://github.com/atgreen/ctfg , etc.


Pull request incoming to add back your missing README emojis.


LLMs seem better suited to help with the Tower of Babel we've created for ourselves: aws commands, Terraform modules, Java libraries, Javascript/React, obscure shell commands, etc.


The strength of Lisps is in ability to define DSLs and then concisely express solutions for problems in that domain. Arguably no other programming language was able to exceed or even match that power until now.

The math behind transformers is deterministic, so LLMs could be treated as compilers (putting aside intentionally adding temperature and non-determinism due to current internal GPU scheduling). In the future I imagine we could be able to declare a dependency on a model, hash its weights in a lockfile and the prompt/spec itself will be the code, which corresponds to that insight.


> the prompt/spec itself will be the code, which corresponds to that insight.

What I've understood from discussions on HN is that LLMs are non-deterministic. Am I right? So the same prompt when executed again could produce a different answer, a different program every time.

That would mean the prompt is not a great 'highleve lanaguage", it would get compiled into a different Lisp-program depending on the time of the day?


Non-determinism is just a limitation of current implementations, but it is not a fundamental property: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...


That is remarkable foresight. I've had Google Gemini take a Dart program it wrote for me and had it convert it to TypeScript while adding some additional requirements - so declarative programming and treating code as data


Although niche, things are pretty lively in the community. Among other things this year great progress was made on Jank, the native LLVM-based implementation with seamless low-level C++ interop. As part of that work a test suite is being created [0] and now includes runners for all of the major implementations to compare compatibility, next best thing besides a formal specification.

[0] https://github.com/jank-lang/clojure-test-suite


That sounds super cool, let me add another voice of encouragement, please do publish it.


Great work! I was just thinking the other day how an interface like this would be useful, it seems strange we don't see more UI attempts beyond basic linear chat.

I find most need for managing context for problem solving. I describe a problem, LLM gives me 5 possible solutions. From those I immediately see 2 of them won't be viable, so I can prune the search. Then it is best to explore the others separately without polluting the context with non-viable solutions.

I saw this problem solving approach described as Tree-based problem management [0]. Often when solving problems there can be some nested problem which can prove to be a blocker and cut off whole branch, so it is effective to explore these first. Another cool attempt was thorny.io [1] (I didn't get to try it, and it is now unfortunately defunct) in which you could mark nodes with metadata like pro/con. Higher nodes would aggregate these which could guide you and give you prioritization which branch to explore next.

Also graph rendering looks cooler, but outliners seem to be more space efficient. I use Logseq, where I apply this tree-based problem solving, but have to copy the context and response back-and-forth manually. Having an outliner view as an alternative for power users would be neat.

[0] https://wp.josh.com/2018/02/11/idea-dump-2018/#:~:text=Tree-... [1] https://web.archive.org/web/20240820171443/http://thorny.io/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: