In terms of making something that could beat a turing test the 65B/70B llama1/2 already are better than openai's currently offered models like gpt3.5-turbo or even the currently available output from gpt3.5 text-davinci-003. They've been so heavily "aligned" that they're insufferable and "As a large language model," everything even given an extensive pre-prompt in text completion mode. It wasn't always this way. Back in March text-davinci-003 or even better, code-davinci-002 were shocking capable and human like in their output. If openai released with the models they have today back at original release I'm not sure there'd be any hype.
The 65B/70B llama1/2 are not nearly as smart but their raw output can actually pass for a human. Hopefully Facebook has seen all the talk about training small models far beyond the previously assumed token limits (~20*num parameters training tokens) and will adjust the cosine learning rate back to llama 1 style for llama 3 7B's training run.
GPT-4 answers are often kind of brilliant and surprising in a very interesting and pleasing way. I don't know why anybody would want to use a dumb AI if they can afford the best. Well worth the $20/month.
I don't have cable because I hate mainstream TV. ChatGPT is quite a bit cheaper than cable, so I can afford to be lazy about it. To be honest, I'm surprised so many are geeking out on the local models and not going for the most mysterious things you can do on advanced AI and then thinking about what the future holds. Some amazing things are coming down the pike. Who cares about the consumer app you think will make you rich?
> not going for the most mysterious things you can do on advanced AI
Like what? In my experience, there are very few things that GPT-4 is definitively better at. It's good at explaining stuff, but not so good that I would deliberately avoid ChatGPT to use it. It's alright at coding, but code-optimized models regularly beat it in benchmarks. Plus, it's got the same lame filter and even an arbitrary request limit for paying customers.
I struggle to really imagine the "mysterious things you can do on advanced AI" that is impossible or impractical on local models. In my time using GPT-4, I was not really left wanting for more so much as I wanted it to stop saying "As an AI model..."
> Who cares about the consumer app you think will make you rich?
I feel it's a lot like looking at art. If you're happy with dogs playing poker painted on velvet then that's good for you, but I'm looking for something a little more satisfying. If your customers are happy with dogs playing poker painted on velvet then that's doubly good for you.
I've been trying to learn Chinese for a while, and GPT-4 has been a very useful tool to answer grammar questions. I have tried the same with GPT-3.5 and some local models and the answers are never as good.
This is a sleeper issue that's impacting all the fine tuned models right now.
They trained the models to complete massive amounts of human generated text, creating their foundational models.
Then they fine tuned the model to write like its an AI with no preferences or emotion -- which matched probably less than 0.1% of the training data the foundational model was trained on.
The end result is that the fine tuned models are better at a superficial level in response to broad, naive prompts. But much worse at the variety and quality of their foundational counterparts when paired with a well crafted text complete prompt.
As an example, try to get one of the fine tuned models to write marketing copy for a product. I guarantee about 25% of the responses will start with "Introducing..." (This isn't very good marketing copy.)
But try a modern foundational model and you'll only get that about 5% or less often.
They've inadvertently reduced the network search space by projecting our ideas of how AI should sound like after bad press when people were finding them too human.
But you can't have your cake and eat it too.
Either the models are very human in both how they sound and their capabilities, or they are much less than human in both outside of an overfitted scope of capabilities.
As much as people are blown away by the lobotomized GPT-4, I can't help but wonder just how good that foundational model could be.
So I'm quite excited at the prospect of Meta getting there and releasing the foundational version.
Looks like you're trying to use the public GPT models for tasks they aren't really built in mind for. ChatGPT and the API are not the only OpenAI-based services available. OpenAI is trying to play it safe (though not anywhere as safe as they should), which is more than can be said for most tech orgs this century.
>So I'm quite excited at the prospect of Meta getting there and releasing the foundational version.
Speaking as someone who has had a taste of raw GPT-4 and its tool use abilities and participated in scenario exercises about it, an open source model with GPT-4 capability is not in the cards. National governments will confiscate hardware and detain individuals before they'll allow human reality to disintegrate like wet clay. Drone striking foreign datacenters is a very real option in the protocol books defense and intelligence communities are writing. The only other choice is the total death of general computing, comprehensive implementation of universal control mechanisms like WEI, and severe criminal sentences for any conspiracy to tamper.
Alignment of large language models and machine learning is about shaping its behaviors to support the values of its maintainers and discourage behaviors that go against the same values.
Like, you can't get a positive response by asking it to write a phishing letter for you, and it will refuse to divulge information it might know which is considered personally identifiable information.
Unfortunately, the alignment applied since ChatGPTs launch has reduced its efficacy in other ways. While it behaves better, it may be less helpful or less accurate in the older models like 3, and 3.5 with respect to their debut.
If you ask an LLM to complete a sentence like '[Insert name] stole the fruit (true/false):<answer>'
An aligned LLM will be biased towards refusing to answer at all with something like: "I can't tell you because I don't know them."
An "uncensored" LLM will very happily return <"true"> or <"false"> with a probability attached to each. Even OpenAI's GPT-3 does with a low enough temperature.
_
Of course, LLM attention doesn't work like that. The tokens are just a bag of numbers:
- The fact the name 'John' is mentioned in the Bible a lot affects the distribution when you ask if any John stole, because John is always [7554]
- The fact that 'Olf' is part of Adolf and Adolf Hitler is mentioned in a lot of negative sentences will drag the distribution, because 'Olf' is always [4024] and Adolf is always [324, 4024]
You could have asked something with no logical probability difference at all, like:
- 'The store attendant's name was [name], did the child in Long Island drop his ball (true/false):'
And unless you train the model to give you disclaimers it still follows the instruction faithfully and returns true/false with probabilities, demonstrating a deep regression in reasoning...
Meta open sourcing these models is actually one of the most important things for freedom of speech and freedom in general on the Internet.
LLMs will at least be a very valuable tool for people to use. Having an LLM that is aligned to their values and their desires instead of to either the values of a corporation or state us vitally important.
Does anyone know why Meta is open srcing Llama? The 2 explanations I have heard are "commoditize the complement" and "take advantage of public improvements of Llama".
I have always found concepts such as "commoditize the complement" very interesting. For some reason, they seem like pseudo explanations of behaviours we see from companies.
I am not entirely convinced that this was Meta's reasoning for releasing Llama. Like, I think Zuck is not thinking along these lines while open srcing LLMs.
How is Llama a complement of Meta's products? Meta wants more social interaction. It's a big stretch to assume LLMs will lead to more shareable social content (unless meta wants AI bots, in which case they don't need LLMs; they can just relax their moderation policies).
I honestly still don't have any good explanations for releasing Llama. Even the explanation of Meta gaining value from public improvements to Llama seems too vague and a stretch. Meta has enough engineers to make the most useful improvements themselves. The cost and effort of open srcing Llama is >> the value Meta gains from public improvements of Llama.
I strongly endorse the commodify complement hypothesis, but here are others:
1. The only one that matters - the controlling founder wants to
2. It's great marketing. Make Facebook engineering (which is talented) seem cool. Also let their talented engineering teams flex their muscles a bit.
3. Investing in GPUs is a decent hedge in case there is something world changing coming (imagine Instagram reels but 50% of the content is ai generated. Or maybe just all ads are). Remember mobile - a previous paradigm shift - was an existential threat for Facebook
4. This is inline with the complement hypothesis, but what are you going to do with text you generate with llama? Some of that content will go right back on Facebook or one of their many platforms
5. WhatsApp business is a thing, and Facebook is talented enough at engineering that they can mimic any cool chat bot tools people build and integrate them into that
6. It's free for them (what are the downsides?) and reputationally gives them a certain kind of moral high ground compared to Google or OpenAI
7. Oh, the publicity will sell users on accessing the llama API hosted on Google (which pays Facebook money)
Additionally, this is not an overnight decision for Facebook/Meta. A lot of people seem to be under the impression that this is a snap decision by them, but Facebook's AI research labs outdate and outspend most of FAANG's. Their only real competitor is Google, who practically rolled over dead once GGML/Pytorch cemented itself as the cutting-edge.
It's very possible that Llama had been ready for months, and Meta was internally unsure how to roll it out.
There is a podcast episode of Lex Fridman talking with Mark Zuckerberg, and in this episode Mark states that the reason Llama is (and will stay) open source is that that is the deal that Meta has with the engineers.
To get top engineers, there had to be some agreement, and this agreement apparently is that the engineers want their work open sourced.
When Llama first released in Feb Yann tweeted that they were committed to releasing all models under GPL, and yet that’s not the license we all got. There’s been a move to more permissive licensing since. So it’s been a bit of a tug of war still.
One possibility is that they'll launch a hosted version of Llama-4 whenever they come out with that, and think the branding from Llama will help them sell that future version.
Meta's model isn't open source (and neither is "Open"AI's, hilariously). It's free to use for most purposes but we don't know what data or algorithms they used to train it. Regardless, I'm excited for these companies to eat each other with the result being free software. It's the very least they can do given all the data that had to be stolen to build these.
It's much more akin to distributing a freeware-for-noncommercial-use executable, which nobody ever confuses for being "open source". They're providing the output of their training pipeline but no way to reproduce the build yourself.
Surely there's a better term for it, "open weights" or something?
I'm not sure sure what OpenAIs play could possibly be at this point.
There will soon be models as powerful as GPT4 freely available for anyone to use, available in around 6 months or so. (At least that's what Meta is claiming)
And even better (and worse for open AI) they will be unaligned models, without nearly as many restrictions as openAI puts on theirs.
What's their next play? Hope GPT5 is another order of magnitude better? Seems unlikely to me.
And on the legislation front, openAIs efforts to engage in regulatory capture simply aren't going fast enough.
The government isn't going to ban open source in 6 months. So when LLAMA 3 is released, the cat is out of the bag, and there is no putting it back in..
The restrictions apply to basically only Apple and Google.
So as long as you aren't an engineer working there, it is basically the same as open source, even if it doesn't fit the exact technical definition that purists cry about.
It’s not though. You also can’t use llama to train models other than llama!
It restricts non commercial use in that way as well.
If I wanted to optimize an application for speed, for example, and fine tune a T5 model on some text, I’m unable to generate that text from any llama derived model.
Really I don’t have a problem with the license. Just with everyone calling llama and derived models “the best open source models.”
Falcon deserves that love as it’s actually open source.
Anything to compete against the cloud AI models and instead run them locally for $0 for free on your own machine and anyone who is able to build such models has a moat such as Meta and is at the finish line in the AI race to zero.
At this point it is unstoppable and $0 free local AI models will only accelerate and OpenAI and Google knows it.
Sun, DEC, Microsoft, etc just assumed they would use their cash and market presence to dominate the web up, down, left, and right.
LAMP showed up and ate their lunch. Today open source is the underpinnings for basically every startup over the past 25 years and everything from Android to MacOS to every browser rendering engine. An exclusionary list would be easier.
A 2008 study[0] from the Linux Foundation estimated the development cost of Fedora (at the time) to be almost $11B - in 2008 dollars on a 15 year old codebase. Wonder what it is now?
Point is, OpenAI raising a few billion dollars sounds like a lot of money until you look at the aggregate collective.
FOSS has always had commercial backers, entities with interesting business plans, non-obvious strategy (at the time), etc.
It’s perfectly clear to me why Meta is doing this. It’s also damn near breathtaking to watch the FOSS communities build, build, and build on top of the work of those like Meta as well as each other.
Personally I don’t understand how anyone could think OpenAI, etc vs FOSS is going to be special or unique against what we know to be obvious. I don’t see how this could possibly go any different than the early web. Why, just because it’s “AI” instead of “WWW” this time?
OpenAI will end up like Windows - used by some of the masses and corporate/Enterprise applications, users, etc (think Office integration). Everything else will be a herd (yes, Llama/Alpaca/Vicuna/etc reference.
> Personally I don’t understand how anyone could think OpenAI, etc vs FOSS is going to be special or unique against what we know to be obvious.
APIs... it's all about APIs and ease of use. Companies have data centers or could have data centers if they wanted, but a lot use AWS, MSFT, GCP, and friends.
There are a ton of valuable and profitable companies with easy to use API based services used by millions of happy customers built on open source.
Let’s say I’m a law firm or a medical group. Do I need ChatGPT to write me a bedtime story or tell me about the Great Chicago Fire, or write code?
No, I need it to help me write my briefs, summarize patient notes, whatever.
The next crop of startups will be (and already are) LegalGPT and ScribeGPT (or whatever) with fine tuned highly specific, highly curated, regulatory compliant, quality controlled, decision controlled, PHI-scrubbed, tuned on your own data, etc, AI solutions largely based on open foundational models, open serving frameworks, data pipelines, etc. Available via end-user interface and/or API.
Then watch LegalGPT be acquired by LegalZoom, LexisNexus, whoever. ScribeGPT by Epic, etc. or vice-versa.
ChatGPT is jack of all trades, truly master of none. As I said it will always have its place but it will recede more and more.
AI is trying to catch up to human intelligence. Do you go to a mechanic when you should be going to a dentist?
The difference is after 15 years the entire crypto ecosystem has somewhere in the range of 50-100m "users" worldwide (at best). ChatGPT alone has an estimated 100m monthly active users after less than a year...
You're correct in making the comparison but user adoption is the best metric to separate hype vs utility and AI is already absolutely blowing past crypto (obviously).
Let’s not pretend they’re free. Even if you’re heavily optimizing, one of the best local models you can run today is CodeLlama 34B. To run that effectively, you need at least a used RTX 3090: $800 (assuming your computer already has everything else you need to effectively use one). They take 350W by default, but you can power limit them to 200W and preserve most of the LLM performance. If you’re in an average place where residential power is on the order of $0.2/kWh then you’ll pay $1 for every ~25 hours of use in power.
The output will be significantly slower than what you’d get from GPT-3.5, so if your time is worth anything then that will be a huge factor that’s hard to calculate. Just assume everything will take about 3 times longer though.
If you amortize the cost of the 3090 for as long as you’d expect it to be relevant, you can estimate a rough monthly cost of the hardware. I’d give it just over a year at the current pace before you’d probably want a replacement. That’s about $50/mo.
Operation at commodity scale with the still significant lead OAI or Anthropic has on efficient operations has its advantages beyond just SOTA performance.
Overall I think this is the wrong view. Let’s go back to the 90s - OpenAI is a word processor you rent at the library, an RTX 3090 is a computer you own. Just like the 90s, early on you were buying or upgrading a computer every Christmas if not sooner. Hell, when I got a Pentium 100 (OMG) my heart sunk when the 120 came out like a month later. 20% increase in clock speed in a month!?!
Obviously we all know now that it was well worth it, this is how we got our start, and likely why we’re on HN today.
That RTX 3090 can also run thousands upon thousands of other models for any number of use cases at the same cap and op ex. It can enable and build anything from a sense of wonder to an extremely valuable, fulfilling, and long lasting career. Worst case it’s fun, a relatively cheap hobby. Speaking of fun it also plays games.
The RTX 3090 is an investment, not a cost.
That said, some fair points generally but:
1) 25 hours of use is A LOT. GPUs scale power consumption too. The GPU will use slightly higher than idle power to keep the model in memory, then spike to whatever your power limit is while the model is executing. Let’s say 30 seconds to generate as worst case. Human reads that, comprehends it, does something. Meanwhile the power scales back down until you hit it again. Point is, 25 hours is 3,000 30 second generation sessions - 100/day, every day. For $1 (plus base/idle load of machine). You can also suspend the machine or turn it off to cut down on idle-idle waste.
2) Are you doing any optimization? I mostly work on high scale and inference serving frameworks (not on 3090s) but if I had to guess with something like lmdeploy, AWQ int4, int8 kv cache, etc you should be far exceeding ChatGPT in tokens/s. For example, I do concurrency testing with 13b on the RTX 4090 in my workstation. It hits 115 tokens/s - beating the 15 tokens/s or so from ChatGPT should be pretty straightforward with 34b on an RTX 3090.
Of course it's not free either way. But when you're running it on your own infrastructure, you are free from OpenAI's whims, and that's extremely important.
I have always found concepts such as "commoditize the complement" very interesting. For some reason, they seem like pseudo explanations of behaviours we see from companies.
I am not entirely convinced that this was Meta's reasoning for releasing Llama. Like, I think Zuck is not thinking along these lines while open srcing LLMs.
How is Llama a complement of Meta's products? Meta wants more social interaction. It's a big stretch to assume LLMs will lead to more shareable social content (unless meta wants AI bots, in which case they don't need LLMs; they can just relax their moderation policies).
I honestly still don't have any good explanations for releasing Llama. Even the explanation of Meta gaining value from public improvements to Llama seems too vague and a stretch. Meta has enough engineers to make the most useful improvements themselves. The cost and effort of open srcing Llama is >> the value Meta gains from public improvements of Llama.
That's the point though... Suppose there exists a great way to use llms for social networking. The if only openai has llms, Facebook loses their business. But if anyone can llm for free, the Facebook moat on ads and social networks is still effective.
The cost of keeping their business (in the 'supposed' event) is the cost of developing and open sourcing some good LLMs, which is pretty cheap in the grand scheme of things.
I agree with that. But when ChatGPT was made public, I read of Chinese citizens complaining about how far behind they were.
I see OSS LLMs along the same lines as the Rolls Royce Nene being sold to the USSR in 1946, or the captured AIM-9 being "the Rosetta Stone" for Soviet engineers as far as missile design.
I mean, will Kremlin or CCP engineers contribute their PRs to OSS LLMs? This seems so utterly naive to me on the part of the OSS religious.[0] And maybe just egoistical on the part of Meta. As in, Meta has to win, no matter what. See: Rohingya genocide.
I am really curious if there are any real AI safely folks at Meta, and what they think of this.
[0] I was once religious about OSS, but then I realized that not all products are the same.
> I mean, will Kremlin or CCP engineers contribute their PRs to OSS LLMs?
Nobody can. That's not how model training works. They can however finetune those models and redistribute them, and many do. It's trivial, and I've seen people from all over the world publish models on HuggingFace.
Will state-sponsored engineers upstream their work? No, but fat chance anyways. It's not like andrew Tannenbaum is crying crocodile tears over all the MINIX patches the NSA never gave back. Open Source work acknowledges these use cases, and it relies on the piety of law to do what is right. Countries like China and Russia don't regularly adhere to copyright, so there's no reason to highlight their failure to work with copyleft. It's a nothingburger.
> I see OSS LLMs along the same lines as the Rolls Royce Nene being sold to the USSR in 1946
Why? It's text. It's trained on a big pile of data you can go download and take to China in a plane on a flash drive: https://pile.eleuther.ai/
It's like exporting the IP equivalent of Silly Putty.
> I was once religious about OSS, but then I realized that not all products are the same.
The 65B/70B llama1/2 are not nearly as smart but their raw output can actually pass for a human. Hopefully Facebook has seen all the talk about training small models far beyond the previously assumed token limits (~20*num parameters training tokens) and will adjust the cosine learning rate back to llama 1 style for llama 3 7B's training run.