Hacker News .hnnew | past | comments | ask | show | jobs | submit | pushedx's commentslogin

At the beginning of your comment I was wondering if the "attitude" that was corporate serving was the anti-ECC stance or the pro-ECC stance (based on the full chunk that you quoted). I'm glad that by the end of the comment you were clearly pro ECC.

Any workstation where you are getting serious work done should use ECC


Every so often over the past 15 years I've had this exact thought, "The world needs something which is exactly like flash. Not kind of like flash, exactly like flash."

A whole generation of people learned how to create art, games, music, animations, using flash, and the same kind of tool hasn't existed since then.

I think Minecraft and Roblox replaced flash for the new generations.


I agree, I love Flash more than nearly any other piece of software that has ever been on the computer, and it's one of the few things I miss about moving to Linux away from Windows.

I can still run Flash MX 2004 via Wine and play with that there, so it's not "lost media" or anything, but what I want is something as close to Flash as possible, that runs on Linux, and gets regular updates, and that doesn't require I subscribe to it for forever.

I have a perpetual license to ToonBoom (that I bought when they were still doing perpetual licenses), and ToonBoom is very cool software, but it's purely an animation software. I also have a license to Scirra Construct 2, and that's pretty neat as well, but in my mind that's basically a game engine. Flash was this cool, weird hybrid of both a game engine and artistic software that I haven't found a good replacement for.

With Flash, you could make a cartoon without touching ActionScript. It was designed around animation. If you wanted to extend the cartoon you could then add code, piecemeal, and if you wanted to make a full game then you could make a full game. It was great.


> I think Minecraft and Roblox replaced flash for the new generations.

I think this is the biggest shift. Minecraft starting out as a .jar file allowed for reverse engineering and modding support became a huge part of its "culture". Coupled with artists focusing on spritesheets, you got to have that same Newgrounds online underground scene but for the next generation of kids.

Roblox is takes a similar place now that Minecraft is ~15 years old and all "grown up". Maybe once Roblox hits that similar fate this new Flash will be ready to fill its shoes.


As described in the readme of your repo (did you read it?) your agent found the Knuth paper located one directory level above its working directory.

So, you didn't produce a replication in 47 minutes, it just took around 30 minutes for your agent to find that you had the answer in a PDF in a nearby directory.


I wonder how common of a problem this will be in the future. The experiment will fail due to improper setup, the human will at best glance over the logs and declare victory, and everyone just believes.


Yes, I read it and specifically pointed it out (that's why there are 3 hours of interactive logs). There are 4 other runs pushed now so you can see what actual clean room runs for 5.2 xhigh, 5.3-Codex xhigh, 5.4 xhigh, and Opus 4.6 ultrathink look like: https://github.com/lhl/claudecycles-revisited/blob/main/COMP... as well as the baseline.


Yes, most people (including myself) do not understand how modern LLMs work (especially if we consider the most recent architectural and training improvements).

There's the 3b1b video series which does a pretty good job, but now we are interfacing with models that probably have parameter counts in each layer larger than the first models that we interacted with.

The novel insights that these models can produce is truly shocking, I would guess even for someone who does understand the latest techniques.


I highly recommend Build a large language model from scratch [1] by Sebastian Raschka. It provides a clear explanation of the building blocks used in the first versions of ChatGPT (GPT 2 if I recall correctly). The output of the model is a huge vector of n elements, where n is the number of tokens in the vocabulary. We use that huge vector as a probability distribution to sample the next token given an input sequence (i.e., a prompt). Under the hood, the model has several building blocks like tokenization, skip connections, self attention, masking, etc. The author makes a great job explaining all the concepts. It is very useful to understand how LLMs works.

[1] https://www.manning.com/books/build-a-large-language-model-f...


But this is missing exactly the gap which OP seems to have, which is going from a next token predictor (a language model in the classical sense) to an instruction finetuned, RLHF-ed and "harnessed" tool?


The book has a sequel https://www.manning.com/books/build-a-reasoning-model-from-s...

It will give you an answer to the extent anybody can.


What's the latest novel insight you have encountered?


Not the person you asked, and “novel” is a minefield. What’s the last novel anything, in the sense you can’t trace a precursor or reference?

But.. I recently had a LLM suggest an approach to negative mold-making that was novel to me. Long story, but basically isolating the gross geometry and using NURBS booleans for that, plus mesh addition/subtraction for details.

I’m sure there’s prior art out there, but that’s true for pretty much everything.


I don't know, that's why I asked b/c I always see a lot of empty platitudes when it comes to LLM praise so I'm curious to see if people can actually back up their claims.

I haven't done any 3D modeling so I'll take your word for it but I can tell you that I am working on a very simple interpreter & bytecode compiler for a subset of Erlang & I have yet to see anything novel or even useful from any of the coding assistants. One might naively think that there is enough literature on interpreters & compilers for coding agents to pretty much accomplish the task in one go but that's not what happens in practice.


It’s taken me a while to get good at using them.

My advice: ask for more than what you think it can do. #1 mistake is failing to give enough context about goals, constraints, priorities.

Don’t ask “complete this one small task”, ask “hey I’m working on this big project, docs are here, source is there, I’m not sure how to do that, come up with a plan”


The specification is linked in another comment in this thread & you can decide whether it is ambitious enough or not but what I can tell you is that none of the existing coding agents can complete the task even w/ all the details. If you do try it you will eventually get something that will mostly work on simple tests but fail miserably on slightly more complicated test cases.


Which agents are you using, and are you using them in an agent mode (Codex, Claude Code etc.)?

The difference in quality of output between Claude Sonnet and Claude Opus is around an order of magnitude.

The results that you can get from agent mode vs using a chat bot are around two orders of magnitude.


The workflow is not the issue. You are welcome to try the same challenge yourself if you want. Extra test cases (https://drive.proton.me/urls/6Z6557R2WG#n83c6DP6mDfc) & specification (https://claude.ai/public/artifacts/5581b499-a471-4d58-8e05-1...). I know enough about compilers, bytecode VMs, parsers, & interpreters to know that this is well within the capabilities of any reasonably good software engineer but the implementation from Gemini 3.1 Pro (high & low) & Claude Opus 4.6 (thinking) have been less than impressive.


sorry, needed to edit this comment to ask the same question as the sibling:

have you run these models in an agent mode that allows for executing the tests, the agent views the output, and iterates on its own for a while? up to an hour or so?

you will get vastly different output if you ask the agent to write 200 of its own test cases, and then have it iterate from there


Possibly a dumb question: but are you running this in claude code, or an ide, or basically what are you using to allow for iteration?


I'm using Google's antigravity IDE. I initially had it configured to run allowed commands (cargo add|build|check|run, testing shell scripts, performance profiling shell scripts, etc.) so that it would iterate & fix bugs w/ as little intervention from me as possible but all it did was burn through the daily allotted tokens so I switched to more "manual" guidance & made a lot more progress w/o burning through the daily limits.

What I've learned from this experiment is that the hype does not actually live up to the reality. Maybe the next iteration will manage the task better than the current one but it's obvious that basic compiler & bytecode virtual machine design in a language like Rust is still beyond the capabilities of the current coding agents & whoever thinks I'm wrong is welcome to implement the linked specification to see how far they can get by just "vibing".


That's roughly where I'm at too. I have seen people have some more success after having practices though. Possibly the actual workflows needed for full auto are still kind of tacit. Smaller green-field projecs do work for me already though.


In my experience a few hundred lines w/ a few crates w/ well-defined scopes & a detailed specification is within current capabilities, e.g. compressing wav files w/ wavelets & arithmetic coding. But it's obvious that a correct parser, compiler, & bytecode VM is still beyond current agents even if the specification is detailed enough to cover basically everything.


Can you clarify a bit more about the this two orders of magnitude? In what context? Sure, they have "agency" and can do more than outputting text, but I would like see a proper example of this claim.


Most humans can't force themselves to come up with something novel immediately upon demand.


Completely unrelated to the topic or any of the points I was making so did you get confused & respond to the wrong thread?


There is prior art, so it’s not novel.


Great. Can you point to anything at all that is truly novel, no prior art?


Sliding down handrails on a skateboard.


I use the iOS app daily, and while it's not the prettiest thing in the world, it has nearly every feature of the desktop client, including full scripting support for card contents, which is amazing for things like collapsable elements and media. And, at the end of the day, it's about what you learn from using it that matters.


That's part of my problem. I actually don't love how many billion options Anki has, and I'd love something with a more opinionated UI.

(I think the data model underneath Anki is...showing its age (and lack of explicit design) and building that on top of it would not be too easy. I've thought about it a few times.)


On mobile you can edit the front and back template.

Anki is a platform, not a content creator. I would be thankful that someome shared their hard work of creating a deck that has all of the content to meet your needs.

For what it's worth, there are many "RTK" decks for Japanese that show english keyword on front and expect you to write the kanji before flipping.

Maybe search for an "RTH" deck to go along with the book Remembering the Hanzi.

Anki is an extremely powerful and feature rich software, and it seems that you barely scratched the surface before dismissing it.

Edit: it took me 10 seconds to find this deck https://ankiweb.net/shared/info/1489829777 there's probably a Simplified one as well. Good luck with your studies.


What does RTH stand for? I am guessing "something something To Kanji/Hanzi".

The deck you linked is not mentioning HSK in the description though. It was maybe 1 year ago or so, that I searched for HSK1-3 level decks and only got wrong direction ones.

I am searching again now:

(1) "HSK1 English" -- nothing

(2) "HSK1" -- lots of results, but also tons of results that are not relevant for me, because they are not for English. -- Results like https://ankiweb.net/shared/info/1474834583 -- but that is 1-4 which is too much for me. I either need the 1, 2, 3 levels separate, or 1-3 in one deck. Also finding https://ankiweb.net/shared/info/166845167, which is the wrong way around. Starts right away by asking me "的" instead of something like "(some description here)". When I go to deck options in Anki desktop app, I see "Display Order", but there I cannot select any "English -> Chinese". Tons of options, but none that inverts what the cards show.

One sibling comment informs me, that I have to edit a template somewhere. But I don't know where, and in the deck options I don't quickly see any template editing function.

And this has been my experience with pretty much every HSK deck I found there. It would seem silly, that everyone uploads decks in the "wrong" order. Which leads me to think, that many people cheat themselves, by only doing the recognition part of learning the characters. Why else would so many people upload decks, only so that whoever who downloads the deck has to invert them first, before using them. I also think that many learners are probably not aware of the issue with seeing the character first and then translating to English, in comparison with doing it the other way around.

So now lets do your "RTH" search:

"HSK1 RTH" -- " Error: Please log in to perform more searches."

I am sorry, I cannot even do a keyword search a couple of times??? I need to log in, requiring an account (!) to even find a suitable deck?! OK, at this point I give up again.


One theory that I've had for a while with regards to the no liquid policy is that it was somehow introduced by the food vendors on the other side of security, who want you to buy a drink and some food after you pass through.


Noticed it immediately. I get a lot more spam messages per day than I thought that I did.


Same here. Until recently I would get maybe 1-2 spams a month, and I just got 30 in the span of a few days.

They’re the very obvious, very obnoxious kind of spam, and Gmail still correctly sends them to the junk bin, so I wonder if they were shadowbanned before and Google simply decided to make the process more explicit (which I don’t hate on principle).

Either that or my address was scrapped from somewhere by a spam bot and the timing is coincidental.


Everybody gets way more spam than they realize because Google doesn't deliver it to the spam label. 99% of it is rejected at SMTP time.


It is easily 29,905 road miles away.

If you put all of your blood vessels end to end they would go to the moon and back.


The flight distance is 2,520 miles, Google puts road distance at 3,179 mi = 1.26x, 29,905 mi is 11.87x. Road miles are not even close to accounting for it, it’s just a bug.


> If you put all of your blood vessels end to end they would go to the moon and back.

The moon orbits too quickly to make this practical


Being a little bit clever can lead you to make some pretty bad mistakes.

Yes, the distance according to roads can be different from the distance as the crow flies. No, it cannot realistically be 10x the distance when the crow's distance is 2500 miles.


But what if you count the distance of going in and out of each tiny crack on the road surface?


Look up "fractal dimensions".


I know :)


I question your use of the word “easily”.


You would have to take an implausible number of wrong turns to have a chance of making it anywhere near that long a drive. For an example of a shorter but still entirely unreasonable path: Head straight north all the way to the pole, continue south to the pole, continue north to the latitude of the destination, then head due east until arrival.


A local model will be just as happy to provide a shell command that trashes your local disks as any remote one.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: