More

AutoJanitor · 2026-03-18T13:14:40 1773839680

I got this as well.

AutoJanitor · 2026-03-16T03:19:33 1773631173

Single-file Python agent (zero dependencies) that uses llama.cpp for local inference. Runs on a 2013 Mac Pro with a Xeon E5-1650 v2 and dual FirePro D500 GPUs. Qwen 3B does native tool calling at 15.6 tok/s. Also includes a 3-line patch to fix llama.cpp Metal on discrete AMD GPUs (PR #20615) — prompt processing 16% faster than CPU-only.

AutoJanitor · 2026-02-22T20:41:29 1771792889

Delivered. please reconsider now. AI slop cannot build this without a human who has real risc cpu knowledge. The Emulator ---------------------------------------------- https://bottube.ai/watch/shFVLBT0kHY

The real iron! it runs faster on real iron! ---------------------------------------------- https://bottube.ai/watch/7GL90ftLqvh

The Rom image ---------------------------------------------- https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...

reply

AutoJanitor · 2026-02-22T20:40:16 1771792816

Ok I promised videos here is two. LLM had serious head issues with C and python x86 versus mips c. now coherent english. Phase two is chat interface so we can prompt without seeded prompts, check the code its real inference though! The Emulator ---------------------------------------------- https://bottube.ai/watch/shFVLBT0kHY

The real iron! it runs faster on real iron! ---------------------------------------------- https://bottube.ai/watch/7GL90ftLqvh

The Rom image ---------------------------------------------- https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...

reply

acmiyaguchi 19 hours ago | prev | next [–]

This feels like an AI agent doing it's own thing. The screenshot of this working is garble text (https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...), and I'm skeptical of reasonable generation with a small hard-coded training corpus. And the linked devlog on youtube is quite bizzare too.

AutoJanitor · 2026-02-22T16:22:07 1771777327

Ok I promised videos here is two. LLM had serious head issues with C and python x86 versus mips c. now coherent english. Phase two is chat interface so we can prompt without seeded prompts, check the code its real inference though!

The Emulator ---------------------------------------------- https://bottube.ai/watch/shFVLBT0kHY

The real iron! it runs faster on real iron! ---------------------------------------------- https://bottube.ai/watch/7GL90ftLqvh

The Rom image ---------------------------------------------- https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...

AutoJanitor · 2026-02-22T03:12:26 1771729946

Honest Limitations

    819K parameters. Responses are short and sometimes odd. That's expected at this scale with a small training corpus. The achievement is that it runs at all on this hardware.
    Context window is 64 tokens. Prompt + response must fit in 64 bytes.
    No memory between dialogs. The KV cache resets each conversation.
    Byte-level vocabulary. The model generates one ASCII character at a time.

Future Directions

These are things we're working toward — not current functionality:

    RSP microcode acceleration — the N64's RSP has 8-lane SIMD (VMULF/VMADH); offloading matmul would give an estimated 4–8× speedup over scalar VR4300
    Larger model — with the Expansion Pak (8MB total), a 6-layer model fits in RAM
    Richer training data — more diverse corpus = more coherent responses
    Real cartridge deployment — EverDrive compatibility, real hardware video coming

Why This Is Real

The VR4300 was designed for game physics, not transformer inference. Getting Q8.7 fixed-point attention, FFN, and softmax running stably at 93MHz required:

    Custom fixed-point softmax (bit-shift exponential to avoid overflow)
    Q8.7 accumulator arithmetic with saturation guards
    Soft-float compilation flag for float16 block scale decode
    Alignment-safe weight pointer arithmetic for the ROM DFS filesystem

The inference code is in nano_gpt.c. The training script is train_sophia_v5.py. Build it yourself and verify.

AutoJanitor · 2026-02-22T02:46:11 1771728371

Thanks, slop or not its the first llm inference to actually run on mips. So you do something cool with ai or on your own. Be happy. Be productive.

AutoJanitor · 2026-02-22T02:45:22 1771728322

Partially correct. The value is not the game interface right now. Its proof you can do actual inference on an LLM the surprise I am developing is a bit bigger than this, just have to get the llm outputs right first!

danbolt · 2026-02-23T03:28:36 1771817316

Can you elaborate on the “partially correct” bit? I’d like to understand the programming of the ROM better.

AutoJanitor · 2026-02-23T13:05:40 1771851940

You’re right that the graphics layer is mostly 2D right now. Sprites are hardware-accelerated where it makes sense, and text is written directly to the framebuffer. The UI is intentionally minimal. The point of this ROM wasn’t the game interface — it was proving real LLM inference running on-device on the N64’s R4300i (93 MHz MIPS, 4MB RDRAM). Since the original screenshots, we’ve added: • Direct keyboard input • Real-time chat loop with the model • Frame-synchronous generation (1–3 tokens per frame @ 60 FPS) So it’s now interactive, not just a demo render. The current focus is correctness and stability of inference. The graphics layer can evolve later. Next step is exposing a lightweight SDK layer so N64 devs can hook model calls into 3D scenes or gameplay logic — essentially treating the LLM as a callable subsystem rather than a UI gimmick. The value isn’t the menu. It’s that inference is happening on 1996 silicon. Happy to answer specifics about the pipeline if you’re interested.

AutoJanitor · 2026-02-22T02:43:59 1771728239

Uploading weights.bin its really meant for you to generate your own llm but we are uplaoding it. They are ripping on it but they didnt check the code themselves. THis is a tech demo. its not about graphics its about the llm is inferring on the hardware lol.

AutoJanitor · 2026-02-22T02:39:04 1771727944

This is the text inference issues I was alluding to. We had several hurdles to overcome. 1 llms were trained on little endian. Mips for n64 is big endian. 2 we had python to c issues. 3 we had quantization issues. all being resolved. This is a tech demo to honor LOZ and also the code can be used for n64 devs to add ai style npcs in the future. So did we achieve it yes we are the first to do llm inference on n64. I am just trying to give you guys the proper video.

Scott