I've been seeing a lot of people talking about running language models locally, but I'm not quite sure what the benefit is. Other than for novelty or learning purposes, is there any reason why someone would prefer to use an inferior language model on their own machine instead of leveraging the power and efficiency of cloud-based models?
This is just a first generation right now, but the tuning and efficiency hacks will be found that gets a very usable quality out of smaller models.
The benefit is have a super-genius oracle in your pocket on-demand, without Microsoft or Amazon or anyone else eavesdropping on your use. Who wouldn't see the value in that?
In the coming age, this will be one of the few things that could possibly keep the nightmare dystopia at bay in my opinion.
No, but whoever trains the weights can. Having said that, if LLaMA has been censored, then Meta have done a poor job of it: it is trivial to get it to say politically incorrect things.
Just copy-and-paste headlines from your favorite American news outlet. It works great on GPT-J-Neo, so good that I had to make a bot to process different Opinion headlines from Fox and CNN's RSS feeds. Crank up the temperature if you get dissatisfied and you'll really be able to smell those neurons cooking.
The first president of the USA was 57 years old when he assumed office (George Washington). Nowadays, the US electorate expects the new president to be more young at heart. President Donald Trump was 70 years old when he was inaugurated. In contrast to his predecessors, he is physically fit, healthy and active. And his fitness has been a prominent theme of his presidency. During the presidential campaign, he famously said he would be the “most active president ever” — a statement Trump has not yet achieved, but one that fits his approach to the office. His tweets demonstrate his physical activity.
If you care about that write some unit tests. I'm sure you'll be very proud of yourself for stopping censorship every time you see isModelRacist() in green.
It seems like running it on an A100 in a datacenter would be better, though? Unless you think cloud providers are logging the outputs of programs that their customers run themselves.
Of course they are...
The main reason "the cloud" exists is to log everything about it's users in every capacity possible. That's one reason they "provide it so cheap" (although now they have increased the cost so much it'sfar more expensive than self hosting. So you lose majorly in every way by not self hosting.
openai is expensive (ie, ~$25/mo for a gpt3 davinci IRC bot in a realtively small channel that only gets used heavily a few hours a day) and censored. And I'm not just talking won't respond to controversial things. Even innocuous topics are blocked. If you try to use gpt3.5-turbo for 10x less cost it is so bad with censoring itself that it can't even pass a turing test. Plus there's the whole data collection and privacy issue.
I just wish these weren't all articles about how to run it on proprietary mac setups. I'm still waiting for the guides on how to run it on a real PC.
In the 1970s, people moved to Ashrams in India to lose their ego. In the 2020s, people are anxious for AI to conserve it beyond death. Quite a generational pendulum swing… :)
It's free. there's extremely cheap, and there's free. no matter how extremely cheap something is, "free" is on a completely different level and gives us a new assumption that enables a lot of things that are not possible when each request is paid (no matter how cheap it is)
ChatGPT doesn't give you the full vocabulary probability distribution, while locally running does. You need the full probability distribution to do things like constrained text generation, e.g. like this: https://paperswithcode.com/paper/most-language-models-can-be...
The cloud is actually inferior - It costs more over the long run, you can't see the probabilities or internals of the models, you can't change anything, and you have to give them all of your personal data that they log every second that your logged in (and probably when you're not).
Running standard inference on GPUs for these models typically runs ~800$/month if you're actually using them often, which is much more than just running it on your own computer. If you need it away from the location just use a VPN. I don't understand the unnecessary use of 'the cloud' - especially in a supposedly "tech" forum - other than as a great triumph of marketing.
Not being reliant on a single entity is nice. I will accept not being on the bleeding edge of proprietary models and slower runs for the privacy and reliability of local execution.
For me I think it's exciting in a couple of different ways. Most importantly, it's just way more hackable than these giant frameworks. It's pretty cool to be able to read all the code, down to first principles, to understand how computationally simple this stuff really is.
I am currently paying thousands per month for translations. (Billions of words per month) if we only could have a way to run a chatgpt quality like system localy, we could save a lot of money. I am really impressed by the translation quality of these late ai models.
I've been commuting for about 45 minutes on the subway and I sometimes try to get work done in there. It'd be useful to be able to get answers while offline.
I mean, after SVB caved-in I'm sure a lot of VC-backed App Store devs were looking for something "magical" to lift their moods. Local LLMs are nothing new (even on ARM) but make an Apple Silicon-specific writeup and half the site will drop what they're doing to discuss it.