I owned an i9 MBP with a discrete GPU. It absolutely was too thin. The CPU and GPU ran hot, it throttled like crazy. It would drain battery while USB-C docked while idling. Worst laptop I've ever owned.
The M1 Max I replaced it with was the opposite. I don't think I heard the fans for the first month. But it was much larger.
Based on the fanless Air, I strongly suspect an M1 Max in the old chassis would have been totally fine for non synthetic workloads and an M1 Pro would probably have been fine in all scenarios.
But I think they over corrected on the chassis design when they were shipping borderline faulty products and haven't walked it back yet.
I speculate they gave themselves a lot of thermal engineering margin to bump up TDP with the M-series MBP design (or perhaps they underestimated how good the M-series chips were going to be) The battery being at the TSA limit of 100Wh is quite nice as well. Another benefit is that it now differentiates the "Pro" line from the rest of the laptop lineup quite significantly. For most people the Air has enough power now and its plenty thin and light. The pro line is for "true" pros with actually intense workflows.
I'm a dev and the MBP line is definitely overkill for me. The 15" MBA handles everything I can throw at it.
Coding is a verifiable domain, so I think you actually have it backwards on that first point. We can now synthesize Stack Overflow sized datasets for an arbitrary new language, and use those to train LLMs to understand it.
It's expensive of course, but if a new language is genuinely better for LLMs to write and understand, that would not be an issue.
It's all about relative difficulty. It's not trivial to convince LLM vendors to include your pet new language in their internal synthetic datasets, and you can build your own and publish it but it'll be fiddly and expensive.
But compared to the immense amount of effort that goes into convincing a critical mass of humans to learn and write about your new language, and using _that_ material to train an LLM, I think it's fair to say things have gotten easier, not harder.
Don’t know what it is about geek culture that leans so conspiratorial.
Sometimes I play a game; before clicking to read comments I try to come up with what the conspiracies will be. This one was obvious (since I’m familiar with the story).
What a strange thing to say. Not only do people frequently recommend carbonated beverages to each other, the upstream meme is even more off. People recommend operating systems to each other so much that there's an entire subculture known for that exact behavior.
No? I have recommended Freestyle sugar free soda as a way to replace heavy CocaCola consumption. Here in Mexico it's a big problem, and I helped me get out of the addiction. ( add Allulose to the soda to add the sweet)
It's a dopamine hit. It's addicting. The medium of the internet seems to add to this where most interactions are conversationally broken, because a thread is a bunch of people airdropping thoughts and never really coming back to back up their arguments or admit something was wrong.
The brain wants things to be simple so rewards you for simple solutions that are "better" and totally ignores complexity and nuance and reality because those are energetically expensive things to pay attention to.
I think its naive to think capitalism doesnt lead to dirty tricks. There's tons of PR and stealth marketing out there. The idea that our system is all "honest good guys" doesn't fit in with the facts.
That's easy. A geek's superpower is his brain, and his identity is being the smartest guy in the room. Belief in conspiracies means you know something that the masses do not, and you were too smart for the man to get one over on you. These beliefs, like all beliefs, are simple acts of ego preservation.
Same as any other conspiratorial thinking: they hold themselves in too high a regard and want to think they’re privy to some secret knowledge that the rubes have missed.
probably because the most obvious "it's exactly as described" is the most boring and uninteresting conclusion, thus you make it more interesting by proposing that it's a big conspiracy
> Don’t know what it is about geek culture that leans so conspiratorial.
It’s much wider. This is why QAnon and contemporary fascism spread. People love a story.
The QAA podcast deep-dives explaining conspiratorial thinking. They started with QAnon and then expanded. The episodes on the Queen of Canada (Romana Didulo) were especially interesting. She’s a dangerous person and so are her followers. Sovereign citizens, too (though they’ve abandoned that term). Think Freemen in Montana in the 90s.
>Don’t know what it is about geek culture that leans so conspiratorial.
The #1 goal one needs to accomplish to render an environment safe for the execution of conspiratorial activity, is to inure the occupants of said environment to the possibility of conspiratorial action taking place. Apriori dismissal shuts down game theoretic behavioral modeling in the operational loop, rendering concerted acts of manipulation near invisible. It's why Hanlon's Razor is both a heuristic for organizational productivity and alignment, and one of the greatest foundational psyops of all time. Assuming benevolent intent of other actors makes it easier to get things done, but makes it nigh impossible to defend oneself against actual malicious intent. Geekdom is one of the few niches where most participants routinely value depth first vs. breadth first knowledge. Deep understanding of behavior, and the nature of motivated reasoning and modelling asymmetry of information with regards to intent quickly makes assumption of benevolent intent a realistically untenable posture to maintain unconditionally. In big business or contexts that tend toward near zero-sum anyway. Is it exhausting? Absolutely. Does it keep you safe from people? Hell yes. Does it make life fun? That depends on the general character of the people you're generally surrounded by I suppose.
Well imo GP is fundamentally misunderstanding TypeScript. It's explicitly a structural language not a nominal one. It goes against the entire design philosophy of TS
It would have been a super reasonable reply to talk about the history of TypeScript, why fundamentally its types exist to retroactively describe complicated datastructures encountered in real world JavaScript. And why when TypeScript overstepped that by creating enums, which require code generation and not mere type erasure to compile, it was decided to be a mistake that they won't repeat.
But instead your rebuttal was pointing out that TypeScript can compile OP's example code, which OP presented as valid TypeScript that they disliked. I'm not defending their position, I'm just saying that it didn't appear you had even properly read their comment.
Two Americans and ten Chinese are on a lifeboat. The Americans are each eating two sandwiches a day and the Chinese are eating one. Supplies are low. You do the math and note that the Chinese sure are eating a lot of sandwiches.
If he was intentionally reading the notes his wife took of attorney meetings regarding their divorce, you may want to consider the possibility that the DVRO was genuinely sought.
That was my experience when I tried Moonshine against Parakeet v3 via Handy. Moonshine was noticeably slower on my 2018-era Intel i7 PC, and didn't seem as accurate either. I'm glad it exists, and I like the smaller size on disk (and presumably RAM too). But for my purposes with Handy I think I need the extra speed and accuracy Parakeet v3 is giving me.
It is about the parameter numbers if what you care about is edge devices with limited RAM. Beyond a certain size your model just doesn't fit, it doesn't matter how good it is - you still can't run it.
I am not sure what "edge" device you want to run this on, but you can compress parakeet to under 500MB on RAM / disk with dynamic quants on-the-fly dequantization (GGUF or CoreML centroid palettization style). And retain essentially almost all accuracy.
And just to be clear, 500MB is even enough for a raspberry Pi. Then your problem is not memory, is FLOPS. It might run real-time in a RPi 5, since it has around 50 GFLOPS of FP32, i.e. 100 GFLOPS of FP16. So about 20-50 times less than a modern iPhone. I don't think it will be able to keep it real time, TBF, but close.
regardless, this model with such quantization strategy runs real time at +10x real-time factor even in 6-year old iPhones (which you can acquire for under $200) and offline at a reasonable speed, essentially anywhere.
You get the best of both worlds: the accuracy of a whisper transformer at the speed and footprint of a small model.
So I'm kinda new to this whole parakeet and moonshine stuff, and I'm able to run parakeet on a low end CPU without issues, so I'm curious as to how much that extra savings on parameters is actually gonna translate.
Oh and I type this in handy with just my voice and parakeet version three, which is absolutely crazy.
Yeah, I've got a 7950x and 64gb memory. My vibe coding setup for Bevy game development is eight Claude Code instances split across a single terminal window. It's magical.
I tried the desktop app and was shocked at the performance. Conversations would take a full second to load, making rapidly switching intolerable. Kicking off a new task seems to hang for multiple seconds while I'm assuming the process spins up.
I wanted to try a disposable conversations per feature with git worktree integration workflow for an hour to see how it contrasted, but couldn't even make it ten minutes without bailing back to the terminal.
I also think Steam does a great job a hiding it, and the new recommendation page is really great IMO. Other than some generic AAA, it introduced me to really great games I enjoyed based on my play history.
The more content is available, the more curation is important and IMO their algorithm currently does a good job at it.
There are some odd cases like that, but you can always "Ignore" a game and it'll never show up again. That also feeds into Steams curation for you based on your interests.
There is an issue on their github about flickering they don't seem to care much about. I think most AI CLIs are using the same reactish cli thing called ink and all are having the same problems. opencode moved to a different library (opentui?) and their client seems to be doing much better. ALthough I must say I like to run the opencode cli locally with the web option and connect to it with a web browser. It's very nice. Plus you can code in bed :)
I think part of the issue is that in production deployments, you're batching high enough that you'll be paging in those long tail experts constantly.
Unless you're handing that in some kind of fancy way, you'll be holding up the batch while waiting for host memory which will kill your throughout.
It makes much more sense for non batched local inference, especially if you can keep the MoE routing stable like you say, but most folks aren't optimising for that.
Ideally, you should rearrange batches so that inference steps that rely on the same experts get batched together, then inferences that would "hold up" a batch simply wait for that one "long tail" expert to be loaded, whereupon they can progress. This might require checkpointing partial inference steps more often, but that ought to be doable.
I think this is doable for very long tail experts that get swapped in for specialised topics - say, orbital mechanics.
But for experts that light up at, say, 1% frequency per batch, you're doing an awful lot of transfers from DRAM which you amortize over a single token, instead of reads from HBM which you amortize over 32 tokens.
I think your analysis is right this would make sense mostly for the 30B-3A style models that are mostly for edge / hobbyist use, where context length is precious so nobody is batching.
Given that experts live per layer I dont think it makes sense to have orbital mechanics experts but … I have wondered about swapping out the bottom 10% of layers per topic given that that is likely where the highest order concepts live. I’ve always wondered why people bother with LORA on all layers given that the early layers are more likely to be topic agnostic and focused on more basic pattern assembly (see the recent papers on how LLMs count on a manifold)
The M1 Max I replaced it with was the opposite. I don't think I heard the fans for the first month. But it was much larger.
Based on the fanless Air, I strongly suspect an M1 Max in the old chassis would have been totally fine for non synthetic workloads and an M1 Pro would probably have been fine in all scenarios.
But I think they over corrected on the chassis design when they were shipping borderline faulty products and haven't walked it back yet.
reply