> They're all slop when the complexity is higher than a mid-tech intermediate engineer though.
This right here. Value prop quickly goes out the window when you're building anything novel or hard. I feel that I'm still spending the same amount of time working on stuff, except that now I'm also spending money on models.
So tired of seeing this trope. Data center energy expenditure is like less than 1% of worldwide energy expenditure[1]. Have you heard of mining? Or agriculture? Or cars/airplanes/ships? It's just factually wrong and alarmist to spread the fake news that AI has any measurable effect on climate change.
Why are you lying? From literally the first paragraph of the CFR article:
> China is the world’s largest source of carbon emissions, and the air quality of many of its major cities fails to meet international health standards.
And even though China emits more carbon annually than the US today, the US and Europe are still ahead in cumulative emissions: https://ourworldindata.org/grapher/cumulative-co2-emissions-.... Cumulative emissions are the carbon that's already in our atmosphere and causing heating today. If you want to apportion "blame" for climate change, then the US is 25% responsible, Europe is 30% responsible, and China is 14% responsible as of 2023. And India is only 3.6% responsible.
China's high emissions today power a manufacturing industry that has made cheap decarbonization via solar and batteries a realistic prospect. That's a much better use of their current emissions compared to what the developed countries do with theirs.
China has a large population and does the dirty work of manufacturing for much of the rest of the entire world.
China has done more for renewable energy solutions than any other country, and their per capita population consumption patterns for personal are lower than many G20 countries.
In a fair representation of data, the total high carbon dioxide output from China should be assigned to source- the people across the globe with high personal consumption that have off shored their industry to China.
climate change is a hoax, but it's also disingenuous to pretend like ai delivers even an infinitesimal amount of the value of either agriculture or mining. Global population approaches zero without either of those things and if you deleted ai, no one would ever notice.
The weirdest thing about this (which is frankly bizarre) is Microsoft emphatically shilling React Native for MacOS[1] usage (???). Like, wtf? Why? Not only is it embarassing for MS to be using another competing company's (Facebook's) UI layer when they're, you know, an operating system company. But they're also pushing it for competing operating systems. What idiotic PM signed off on this? How in the world does Microsoft benefit out of promulgating Facebook's technology?
Microsoft decided to move to be a cloud services company instead of an operating system company. The operating system is merely a convenient means to acquiring new service subscriptions.
Apple is a hardware company and likes to have control over the whole user experience. Microsoft doesn’t care about the experience as long as they can sell subscriptions, and doesn’t care much about hardware either.
Ok, so looking at the commit log[1], I was mostly interested in seeing what the "moonshot ideas" implementations looked like, but basically everything is just hyperparameter tuning. Which is nice, but likely not worth the $$$ spent on the tokens. Am I missing something here?
It would seem wise to modify the autoresearch instructions to first estimate the computational costs rigorously and then sort and compare the proposals for human review, and for each actually executed attempt to feed back the computational costs with LoRa adapter?
i.e. perhaps minimal changes to autoresearch can take control for cost-effective research to occur.
Yes but at that point you may as well use a proper hyperparameter tuning framework like optuna if all the LLM agent is supposed to do is do hyperparameter tuning.
Does optuna think abstractly (i.e. use LLM to interpret the code and come up with insights), or just perform hyperparameter tuning experiments on user-indicated parameters?
The latter, but it uses fairly optimized approaches to ensure it selects the best candidates.
If you look at the commits, you can see that all it does is just set different values for different parameters of continuous values: the type of thing that I trust statistics a lot more than reasoning. Optuna can make very informed decisions when making lots of different changes at once, slowly converging towards optimal parameters, where the LLM seems to be throwing stuff at a wall and see what sticks.
What would work best if the LLM would try to approach things on a higher level, ie use Optuna, but reason about better approaches for algorithms and/or data or whatever. But what it ends up doing is tuning parameters manually, only one / a few at a time, extremely inefficient and unlikely to be optimal.
> Yes but at that point you may as well use a proper hyperparameter tuning framework like optuna if all the LLM agent is supposed to do is do hyperparameter tuning.
while the "novelty" of autoresearch is that it may symbolically reason about the computation, analyze the codebase, etc. i.e. a wider search space (harder) but symbolic reasoning.
I genuinely feel disrespected if AI is used to write an article and it's not disclosed in the first paragraph. It's not really that big of a deal, tbh, it's like saying "I took a picture, I didn't paint it."
Which is fine, but please disclose it. Otherwise, like in this case, I'm going to assume the author is a moron that can't write for shit who thinks their readers are morons that can't read for shit.
Agree. Also because of the way AI writes, it takes SO LONG to read through it (they're trained on blogspam where the page tells you the author's life story as well as the bloody history of bread before telling you how to bake it)
That's why in this case I usually ask to another AI to make me a short summary with the main points. I wish the human behind the looong article idea chooses to publish a short summary directly instead.
Please can you explain the evidence that this is generated content.
Yes, the site is new, but other posted articles are 100% consistent with the author wanting a guaranteed level of local inference with large models.
What I read was written by a skeptic who took claims and systematically addressed a number of issues. The debunking was concise and used simple sentences.
The boxes in the pictures were, to my eyes, generated manually using the macOS Preview annotation feature. They are not well aligned. I've used this technique many times to general overlays. If this were me, I'd get called out. I like nicely spaced and proportioned boxes! NB: iFixit tear downs are mis-aligned as well and it bugs me.
People have a distasteful habit of assigning others into boxes. Particularly if that box currently has a negative connotation. Boxing is a primary tool of the; misguided, bullies, sycophants, censors and those with an unspoken agenda. Humor: Which box applies?
This is not true. Authors claim that w.r.t. training, their method adds negigible overhead for AttnRes with no memory impact (but is way more complicated for Block AttnRes since we need to use pipelining for larger models, hence the O(Ld) & O(Nd) figures, with N ≪ L).
> WAY lower bandwidth requirements for inference.
Also not true. Paper has nothing to do with inference, apart from the benchmarks. If you're looking at the graph about "compute advantage," it's about training compute. They do some interpolation to get to the 1.25x number, basically answering the question "if non-AttnRes architecture were trained, how much compute would it take to get to the same loss as AttnRes?" (The answer being ~20% more compute.) It's an interesting claim, but there's all kinds of weird and unexpected convergence that can happen, so take it with a grain of salt.
I think what they're getting at is that for a given unit of compute, this method achieves 125% performance.
If model A reaches performance level 100 using 100 units of compute using old methods, and you train model B using AttnRes, aiming at performance level 100, it costs you 80 units of compute.
It probably doesn't map precisely, but that's where people are diverging from the claim - it doesn't explicitly say anything about reduced inference or training time, but that's the implicit value of these sorts of things. Less compute to equivalent performance can be a huge win for platforms at scale as well as for local models.
> I think what they're getting at is that for a given unit of compute, this method achieves 125% performance.
This is not what they're getting at; I explained exactly what they're getting at. I mean, your equivalence of "loss" (what authors actually measured) and "performance" is just bizarre. We use benchmarks to measure performance, and the numbers there were like 1-5% better (apart from the GPQA-Diamond outlier).
Overwhelmingly, no. You may have mistaken this for a lab's reading group, but most people here just skim the README, maybe read the abstract or figures. Expecting them to do more is uh... a bit strange?
But also you can forgive people for equating loss with performance, which are admittedly different but related ideas.
> I don’t have to solve any problems with languages that are elaborate practical jokes
This is just being needlessly dismissive. Esolangs are (and have been) an area of active CS research for decades. I know I'm a bit of an esolang nerd, and while some are jokes, most focus on specific paradigms (e.g. Piet is visual, bf is a Turing tarpit, etc.).
> I think most Python programmers when tasked with “now do it in brainfuck” would fail.
This is untrue. Given internet-level awareness and infinite time, virtually all developers should be able to go from Python to brainfuck (trivially, I might add.) Did you even look at the test sets? It's all pretty basic stuff (palindromes, array traversal, etc.—we aren't using pandas here). I mean, sure, it would take forever and be mega annoying, but manipulating a head and some tape is hardly difficult.
LLMs are trained to be precise (and more specifically: semantically precise), especially in the fine-tuning phase. An LLM just trained on the corpus of full human production would surely sound more "human," but it would also probably be pretty useless. So that's why idioms like "it's not X, it's Y" are a dead giveaway; but really, any structure that tries to "guide" our salience is a dead giveaway. Here's a random paragraph from Knuth's Literate Programming†[1]:
> For example, a system program is often designed to gather statistics about its own operation, but such statistics-gathering is pointless unless someone is actually going to use the results. In order to make the instrumentation code optional, I include the word ‘stat’ just before any special code for statistics, and ‘tats’ just after such code; and I tell WEAVE to regard stat and tats as if they were begin and end. But stat and tats are actually simple macros.
I encourage you to read that paragraph a few times. Even if you have no idea what the context is, you get that there's a point, that there's something else to dig into, that the author might be being a bit cheeky. In other words, you can feel Knuth behind the ink. Philosophers would call this intentionality[2]. LLMs produce the polar opposite of garden path sentences[3] (and, imo, that's why they're so easy to spot).
† I specifically picked something technical to illustrate that even in domains where semantic precision is of utmost importance, human expression is still just that: human.
I fed ChatGPT part of the introductory paragraph of that paper and instructed it to continue. It gave me this:
> In this light, the activity of programming becomes less an act of issuing commands and more an act of communication. The computer is, after all, an obedient but uncomprehending servant; it will execute whatever precise instructions we provide. But our colleagues, our future selves, and the broader community of readers are not so easily satisfied. They demand clarity, intention, and narrative. A program, then, should be structured not merely for execution, but for reading—its logic unfolding in a manner that mirrors the way one might naturally explain the solution to another person.
> This shift in perspective has practical consequences. When we write with exposition in mind, we are compelled to confront ambiguities that might otherwise remain hidden. Vague assumptions must be made explicit; convoluted steps must be reorganized into simpler, more digestible ideas. The discipline of explaining a program often leads to improvements in the program itself, since confusion in the prose is frequently a symptom of confusion in the underlying design.
Fascinating technology. I would not be able to immediately tell this was AI generated. So these models can in some cases produce text that doesn't immediately set off alarm bells. As an avid reader and writer I'm not really sure what to make of it. I don't want to consume AI generated art or literature because it's completely besides the point, but in the future will we even be able to tell? How do we even know if anyone around us is real? Could they just be sufficiently advanced LLM's, fooling us? Am I the only human in the matrix?
Whether or not one can tell it's AI generated, one can certainly tell it's not Knuth. For one thing, the writing style is very different. Not that there haven't been other great computer scientists who may have written in this style, but it definitely doesn't sound like Knuth (there is no "being a bit cheeky" for sure). But also, the ideas it has produced are simply more of the same; kind of a natural progression / what a typical grad student may write. Knuth always has something new and surprising to say in every paragraph, he wouldn't harp on a theme like this. Also he mixes “levels” between very high and very low, while the paragraphs you quoted stay at a uniform level.
But of course, writing as good as a grad student's (just not the particular delightful idiosyncratic style of a specific person) is still very impressive and amazing, so your concerns are still valid.
Knuth's paper is 100% in the training set, so while your result is decent, it's undoubtedly tainted. But let's look at the output anyway:
> ...the activity of programming becomes less an act of issuing commands and more an act of communication
directly contradicts:
> The computer is, after all, an obedient but uncomprehending servant...
If programming becomes "an act of communication" how can an "uncomprehending servant" make heads or tails of what I'm telling it? And I get that the two aren't exactly contradictory here, but this implied claim would certainly require at least a throwaway sentence.
> When we write with exposition in mind, we are compelled to confront ambiguities that might otherwise remain hidden.
I'm being a bit nitpicky, but this is a non-sequitur; we aren't necessarily required to confront any ambiguities, even when we're trying very hard to be expository. The counter-examples I'm thinking of at the moment are contrived (amnesia, my four-year-old niece trying to tell a story, etc.) but I mainly take issue with the word "compelled."
> its logic unfolding in a manner that mirrors the way one might naturally explain the solution to another person
People explain things in all kinds of weird circuitous ways, so while this (as all AI-generated output) seems interesting prima facia, it's actually kind of a dud when you think about it for more than 5 seconds.
> Vague assumptions must be made explicit; convoluted steps must be reorganized into simpler, more digestible ideas.
and
> ...ambiguities that might otherwise remain hidden...
directly contradicts:
> ...whatever precise instructions we provide
It seems like the computer can somehow encode "ambiguities" and "vague assumptions" as "precice instructions." How, exactly, does that work? (Spoiler: it doesn't, it's gibberish.) On the other hand, if you read Knuth's first few paragraphs, he clearly has a point in mind; I'd even say he's being a bit wordy, but never equivocating. In fact, by the fourth paragraph, he's almost giddy with excitement.
"Poppy turns your contacts into a living garden. Gentle reminders, zero guilt."—bleh, at least write your own taglines :/
This is a neat idea that has been tried about 300 times over, but since it's contingent on already being cognizant of keeping up with relationships, people that install it aren't going to people that need to be using it.
> This is a neat idea that has been tried about 300 times over
Could you share the top 3 attempts that tried it and are better at it? I only know that things like this should exist, but didn't look any further into this class of things, yet.
My idea of what to look into is some kind of CRM for my personal contacts.
Not sure if it's top 3, but I use Monica https://www.monicahq.com/ which does advertise itself as a personal CRM. I certainly underutilize its features but things like birthday reminders + a place for a few notes (where do they live again? who's their partner?) is nice
Well I just have a folder in obsidian and a template for friends. So I can fill in the various fields like address, names of kids/pets, things they like for when I need to buy a present etc.
I recommend keeping it simple. The Obsidian "Bases" feature is a good fit for this if you don't want to go deep w plugins and DIY (which is also viable but has more learning curve and overhead).
Check out https://github.com/Relvio-AI/relvio, its an open-source personal CRM that connects to Gmail and automatically extracts your contacts. You can add notes, tag people by relationship, set follow-up reminders with overdue tracking, and see who you're losing touch with. LinkedIn import works via CSV.
Free, runs locally on your machine with no cloud or subscription, takes a couple minutes to set up
I mean, I’m a pretty ADHD guy, very in the moment, and although I sporadically invite friends or old colleagues to catch up for lunch, I mean to do it regularly, but I might forget to invite them for years at a time, so this might be good for me. I could really used an “Anki for lunch”, spaced repetition reminders for people in my Rolodex, as it were.
Makes me think about a neat feature possibility -- a constraint that means having a garden means others can see into it in some sense, just like a real life garden in your yard. A garden demonstrates to others your care and attention toward it
Very interesting idea! One of the features of the app is the no signup no data associated to you which has to change if I want to allow peeking into your friends garden to see “anonymous views” of their garden.
This right here. Value prop quickly goes out the window when you're building anything novel or hard. I feel that I'm still spending the same amount of time working on stuff, except that now I'm also spending money on models.
reply