HN2new | past | comments | ask | show | jobs | submit | 3abiton's commentslogin

Man, that was such an enjoyable read. I loved your story on the wild server hunt, back when it was posted on r/localllama. I think one thing that is missing from the whole AI "discussion" is this train of thought of how we go from abstract mathetmatical formulation to intuitive understanding of the underlying functionality, and you showcased it beautifully in this article. Similarly to 3blue1brown who also did an amazing series on transformers. Kudos!

I totally get it. I have the M4 Air, grabbed it for 700$ on sale. I also have a MSI Creator with Linux (wayland). Performance wise the base Air crunches through everything up until lots of things are open and gpu is roaring (encoding or streaming), and with colima, I have few incus linux containers up and running. Battery life is formidable. Nothing comes close.

My linux laptop (32GB ram / beefy gpu) barely withstand 40 min on battery, but can handle very daunting tasks, and obviously gaming.

These are 2 different use cases, but right now, for the ultra portable laptop, Air is the king, until x64 brings back the efficiency per watt. Even qcom can't compete. That being said, I am a big fan of the apple hardware and not the apple software, so whenever Asahi linux is ready enough (with good battery life), I am definitely jumping ship.


That beefy GPU is the killer for battery life. There's quite a few PC laptops floating about that get in the 10-16 hour range battery life on lighter workloads (text editors, fast compilers, streaming video, browsing internet). I'm typing this on one right now. I wish it was running linux, but I need windows for work up until we get the last of our antiquated .net platform on core.

Sure, it's got integrated graphics so it won't win any gaming awards, but that's what the laptop with the beefy GPU sitting in the corner is for :) That thing pumps out enough heat to not be too pleasant sitting on a lap anyway.


Exactly. Big GPUs are the #1 reason battery doesn’t last on Linux laptops.

Power management is not done well with the GPU drivers in Linux. If they are not used, they still draw a lot of power, while that’s not really the case on Windows, from what I heard.

I think the best is to get a good Linux laptop, but with an integrated GPU. If you really want to do anything beefy, you can always use an eGPU :)!

Obviously this will never come close in terms of convenience as having an actual M series MacBook…

Wishing you best of luck for the .net migration!


> Kudos for people being willing to change their opinion and update when new evidence comes to light. > 1. https://cs.stanford.edu/~knuth/chatGPT20.txt

I think that's what make the bayesian faction of statistics so appealing. Updating their prior belief based on new evidence is at the core of the scinetific method. Take that frequentists.


It does not seem fair to say that frequentists do not update their beliefs based on new evidence. This does not seem to accurately capture what the difference between Bayesians and frequentists (or anyone else) is.

What's the difference as you see it?

Everyone updates their belief in hypotheses based on the perceived strength of evidence they observe. That's just science.

Frequentists and Bayesians differ in which sets of statistical tools they prefer for measuring the strength of evidence.


> Everyone updates their belief

Uh oh. How does frequentist model define "belief" and "updating a belief"?


Are frequentists a group that self identifies? Don't scientist use the best tool for the job.

That's the best kind of "benders"

> They’re at this level because the editors have always had low standers.

It's not just Ars Technica. I would go as far as saying the big majority. I work at the biggest alliance of public service media in EU, and my role required me to interact with editors. I often do not like painting with broad brush, but I am yet to meet a humble editor yet. They approach everything with a "I know better than anyone else" attitude. Probably the "public" aspect of the media, but I woupd argue it's editorial aspect too. The rest of the staff are often very nice and down to earth.


> but I am yet to meet a humble editor yet. They approach everything with a "I know better than anyone else" attitude.

They're like "UX experts" in software. One does UX for software, the other does UX for text. Same attitude problems, from the way you describe it. If the expert in something so subjectively judged is seen to be conceding anything, that might undermine their perceived expertise. Any push back is interpreted as somebody challenging their career.


> Any push back is interpreted as somebody challenging their career.

I mean, yes, this happens quite a bit, especially with egotistical people.

But to play devils advocate they do have to deal with a massive fuckload of bullshit asymmetry where people dumber than rocks spew forth a never ending stream of stupid crap with the authority of an LLM.


<< They approach everything with a "I know better than anyone else" attitude.

My charitable read is that if one has to interact with the public, one naturally develops an understanding of what is wrong with it.


I think he caught some flack for promoting claudebot at that time, and giving it a rave review. Some people are hardliner. His work has always been amazing nonetheless.

That would be more consistent with it making the frontpage and then getting flagged. Just getting ignored? Unlikely except by randomness.

I think the vast majority of users here agree with you (and me!) that karpathy's work is incredible. Complainers are always over-represented in comments, of course.


I totally agree, except the more we get used to working with the tools the better and faster things will get. I would argue the field has been evolving fast in the past 3 years, but now it's showing signs of slowing down. And I think this is good, as it will allow people to catch up, and refine the approach to adapt better to the new paradigm of coding.

I think it's worth mentioning, that the achilles heel of DT, is in fact, data (more specifically feature) engineering. If one does not spend significant time cleaning and engineering the features, the results would be much worse than, say a "black box" model, like NN. This is the catch. Ironically, NN can detect such latent features, but very difficult to interpret why.

This varies so wildly from domain to domain. Highly structured data (time series, photos, audio, etc.) typically has a metric boatload of feature extraction methodology. Neural networks often draw on and exploit that structure (i.e. convolutions). You could even get some pretty good results on manually-extracted neural-network-esque features handed off to a random forest. This heuristic begins to fall off with deep learning though, which imo, is a precursor to LLMs and showed that emergent complexity is possible with machine learning.

But non-structured data? Pretty pointless to hand off to a neural network imo.


Ah! Your comment helped me understand the parent comment so much more. I thought it was more about data hygiene needs.

Yes a DT on raw pixel values, or a DT on raw time values will in general be quite terrible.

That said the convolutional structure is hard coded in those neural nets, only the weights are learned. It is not that the network discovered on its own that convolutions are a good idea. So NNs too really (damn autocorrect, it's rely, rely) on human insight and structuring upon which they can then build over.


  > So NNs too really (damn autocorrect, it's rely, rely) on human insight and structuring upon which they can then build over.
If so, one can repeat it with other basic models as well.

It is possible to train deep decision trees forests [1]. Then I believe that it is possible to train deep convolutional decision trees forests.

[1] https://arxiv.org/pdf/1702.08835


That has not been my experience though, apart from the need of standard data hygiene that one has to maintain for any ML exercise.

Normalising the numeric features to a common range has been adequate. This too is strictly not necessary for DTs, but pretty much mandatory for linear models. (DTs are very robust to scale differences among features, linear models quite vulnerable to the same.)

One can think of each tree path from root to leaf as a data driven formulation/synthesis of a higher level feature built out of logical conjunctions ('AND' operation).

These auto-synthesized / discovered features are then ORed at the top. DTs are good at capturing multi-feature interactions that single layer linear models can't.

NNs certainly synthesize higher level features, but what does not get highlighted enough is that learning-theory motivated Adaboost algorithm and it's derivatives do that too.

Their modus operandi is "BYOWC, bring your own weak classifier, I will reweigh the data in such a way that your unmodified classifier will discover a higher level feature on its own. Later you can combine these higher features linearly, by voting, or by averaging".

I personally favor differentiable models over trees, but have to give credit where it's due, DTs work great.

What leaves me intellectually unsatisfied about decision trees is that the space of trees cannot be searched in any nice principled way.

Or to describe in terms of feature discovery, in DT, there's no notion of progressing smoothly towards better high level features that track the end goal at every step of this synthesis (in fact greedy hill climbing hurts performance in the case of decision trees). DTs use a combinatorial search over the feature space partitioning, essentially by trial and error.

Neural nets have a smooth way of devising these higher level features informed by the end goal. Roll infinitesimally in the direction of steepest progress -- that's so much more satisfying than trial and error.

As for classification performance where DT have struggled are cases where columns are sparse (low entropy to begin with).

Another weakness of DT is the difficulty of achieving high throughput runtime performance, unless the DT is compiled into machine code. Walking a runtime tree structure with not so predictable branching probabilities that doe'snt run very fast on our standard machines. Compared to DNNs though this is a laughable complaint, throughput of DTs are an order of one or two faster.


  > Ironically, NN can detect such latent features
Like they detect magazine logo on images of horses?

While you're right as in, it's nothing new given a trail of info, here they didn't need to do classical feature engineering, but purely LLM (agentic) flow. But yes, given how much information is self exposed online I am not surprised this is made easier with LLMs. But the interesting application is identifying users with multiple usernames on HN or reddit.

Reminder that open source is always open for contributions


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: