HN2new | past | comments | ask | show | jobs | submit | d4rkp4ttern's commentslogin

This sounds very promising. Using multiple CC instances (or mix of CLI-agents) across tmux panes has always been a workflow of mine, where agents can use the tmux-cli [1] skill/tool to delegate/collaborate with others, or review/debug/validate each others work.

This new orchestration feature makes it much more useful since they share a common task list and the main agent coordinates across them.

[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...


Yeah, I've been using your tools for a while. They've been nice.

I didn’t see that but I do get a lot of stutters (words or syllables repeated 5+ times), not sure if it’s a model problem or post processing issue in the Handy app.

I’m curious about this too. On my M1 Max MacBook I use the Handy app on macOS with Parakeet V3 and I get near instant transcription, accuracy slightly less than slower Whisper models, but that drop is immaterial when talking to CLI coding agents, which is where I find the most use for this.

https://github.com/cjpais/Handy


Since Llama.cpp/llama-server recently added support for the Anthropic messages API, running Claude Code with several recent open-weight local models is now very easy. The messy part is what llama-server flags to use, including chat template etc. I've collected all of that setup info in my claude-code-tools [1] repo, for Qwen3-Coder-next, Qwen3-30B-A3B, Nemotron-3-Nano, GLM-4.7-Flash etc.

Among these, I had lots of trouble getting GLM-4.7-Flash to work (failed tool calls etc), and even when it works, it's at very low tok/s. On the other hand Qwen3 variants perform very well, speed wise. For local sensitive document work, these are excellent; for serious coding not so much.

One caviat missed in most instructions is that you have to set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC = 1 in your ~/.claude/settings.json, otherwise CC's telemetry pings cause total network failure because local ports are exhausted.

[1] claude-code-tools local LLM setup: https://github.com/pchalasani/claude-code-tools/blob/main/do...


I absolutely love my UHK [1] split mech keyboard that I ordered from Hungary several years ago. It’s the only one I stuck with after trying some other popular ones. Other than being split, the keyboard layout is standard so it’s easy to adapt to.

It probably helps me avoid RSI. I keep an apple trackpad between the two splits, so I never use a mouse. And a microphone in the middle as well, you can guess why. I clamshell my MacBook and almost always work on a monitor. Besides ergonomics, the biggest benefit is the on-board programmability; it lets me define custom layers and macros so I can trigger complex window management, app switching, and IDE navigation with simple key combos.

[1] https://uhk.io/


That's exactly how I used to work about 15 years ago, but I found that the Apple trackpad killed my wrists. These days I just have a regular mouse, and simply try to do as much as I can from the keyboard.

I agree trackpad is not RSI-proof by any means, but for me mousing is worse. With the trackpad in the middle I can use either hand to scroll or click etc. I also keep that minimal and instead rely on keyboard tools like Vimium, and scroll kb shortcuts

Interesting that it doesn’t specify what type of oatmeal, e.g steel cut vs quick oats etc. I thought steel cut is more beneficial.

Rolled oats according to the paper

Thanks , I found it after clicking through to the actual nature paper, where it’s a detail buried deep down in the paper. They really should have mentioned it up front.

An interesting shift I’ve seen over the past few weeks, is we’re starting to refer to bare LLMs themselves as “agents”.

Used to be that agent = LLM + scaffold/harness/loop/whatever.


I think some of the distinction here is that the more recent "bare LLMs" have been more purpose built, augmented with "agent" specific RL, and in general more fine tuned for the requirements of "agents". Things such as specific reasoning capabilities, tool calling, etc.

These all make the "bare LLMs" better suited to be used within the "agent" harness.

I think the more accurate term would be "agentic LLMs" instead of calling them "agents" outright. As to why its the case now, probably just human laziness and colloquialisms.


Yes, the post training is the special sauce.

GPT 5.2 in a simple while loop runs circles around most things right now. It was released barely a month ago and many developers have been on vacation/hibernating/etc. during this time.

I give it 3-4 more weeks before we start to hear about the death of agentic frameworks. Pointing GPT5+ at a powershell or C#/Python REPL is looking way more capable than wiring up a bunch of domain-specific tools. A code-based REPL is the ultimate tool. You only need one and you can force the model to always call it (100% chance of picking the right tool). The amount of integration work around Process.Start is approximately 10-15 minutes, even if you don't use AI assistance.


Yes this “REPL/CLI is all you need” realization is exactly what’s behind the wild success of Claude Code and derivative CLI coding agents.

My definition of agent has always been an LLM with "effectful" tools, run in a loop where the LLM gets to decide when the task is complete. In other words, an LLM with "agency".

This is exactly how I think of it. An agent has three elements: intelligence (LLM), autonomy (loop) and tools to do anything interesting/useful.

I almost thought it was MalBot, which would have been more apt.

Parakeet V3 is near-instant transcription, and the slight accuracy drop relative to the slower/bigger Whisper models is immaterial when talking to AIs that can “read between the lines”.

This is not strictly speech-to-speech, but I quite like it when working with Claude Code or other CLI Agents:

STT: Handy [1] (open-source), with Parakeet V3 - stunningly fast, near-instant transcription. The slight accuracy drop relative to bigger models is immaterial when you're talking to an AI. I always ask it to restate back to me what it understood, and it gives back a nicely structured version -- this helps confirm understanding as well as likely helps the CLI agent stay on track.

TTS: Pocket-TTS [2], just 100M params, and amazing speech quality (English only). I made a voice plugin [3] based on this, for Claude Code so it can speak out short updates whenever CC stops. It uses a non-blocking stop hook that calls a headless agent to create the 1/2-sentence summary. Turns out to be surprisingly useful. It's also fun as you can customize the speaking style and mirror your vibe etc.

The voice plugin gives commands to control it:

    /voice:speak stop
    /voice:speak azelma (change the voice)
    /voice:speak <your arbitrary prompt to control the style or other aspects>
[1] Handy https://github.com/cjpais/Handy

[2] Pocket-TTS https://github.com/kyutai-labs/pocket-tts

[3] Voice plugin for Claude Code: https://github.com/pchalasani/claude-code-tools?tab=readme-o...


Wow Handy works impressively well! Excellent UX too (on Windows at least).

I've been dabbling with STT quite a bit and built my own tool using Deepgram. But just tried Handy and it's SO FREAKING FAST! Love it.

Yes especially with Parakeet V3. It’s also nicely hackable, I Clauded a couple PRs to improve the experience, like removing stutters and filler words.


Nice, I’ll have to try it out. They should really make a uv-installable CLI tool like pocket-TTS did. People underestimate just how much more immediately usable something becomes when you can simply get something by doing “uv tool install …”

True that. People, especially developers, underestimate the importance of packaging. Or, in general, making it easier for others to use your product.

So I benchmarked it and there’s really no advantage over pocket TTS. There are some tradeoffs like Kitten doesn’t have streaming audio.

Hi, so I'm looking for an stt that can happen on a server/cron, that will use a small local model (I have 4 vCPU threadripper CPU only and 20G ram on the server) and be able to transcribe from remote audio URLs (preferably, but I know that local models probably don't have this feature so will have to do something like curl the audio down to memory or /tmp and then transcribe and then remove the file etc).

Have any thoughts?


I’ve no thoughts on that unfortunately.


posts like this are why i visit HN daily!!!

thanks for sharing your knowledge; can’t wait to try out your voice plugin


Same!

Feel free to file a gh issue if you have problems with the voice plugin


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: