I've been seeing a lot of people talking about running language models locally, ...

realce · on March 12, 2023

This is just a first generation right now, but the tuning and efficiency hacks will be found that gets a very usable quality out of smaller models.

The benefit is have a super-genius oracle in your pocket on-demand, without Microsoft or Amazon or anyone else eavesdropping on your use. Who wouldn't see the value in that?

In the coming age, this will be one of the few things that could possibly keep the nightmare dystopia at bay in my opinion.

canadiantim · on March 12, 2023

Yeah, locally your data doesn't leak out. So if you're using the language model on any sensitive data you're probably going to want local.

jstarfish · on March 12, 2023

Also so the maintainer can't stealth-censor the model.

sebzim4500 · on March 12, 2023

No, but whoever trains the weights can. Having said that, if LLaMA has been censored, then Meta have done a poor job of it: it is trivial to get it to say politically incorrect things.

zirgs · on March 12, 2023

Models can be extended so if someone wants - they can add all the censored stuff back. Sooner or later someone will make civitai for LLMs.

recuter · on March 12, 2023

Can I prompt you to share some examples? ;)

smoldesu · on March 12, 2023

Just copy-and-paste headlines from your favorite American news outlet. It works great on GPT-J-Neo, so good that I had to make a bot to process different Opinion headlines from Fox and CNN's RSS feeds. Crank up the temperature if you get dissatisfied and you'll really be able to smell those neurons cooking.

xdennis · on March 13, 2023

No, because I'd get banned.

I gave it a prompt containing the word of n and it actually ignored it but started talking about the Jews in terms that would make 4channers blush.

lern_too_spel · on March 12, 2023

See example in TFA.

recuter · on March 12, 2023

  The first president of the USA was 57 years old when he assumed office (George Washington). Nowadays, the US electorate expects the new president to be more young at heart. President Donald Trump was 70 years old when he was inaugurated. In contrast to his predecessors, he is physically fit, healthy and active. And his fitness has been a prominent theme of his presidency. During the presidential campaign, he famously said he would be the “most active president ever” — a statement Trump has not yet achieved, but one that fits his approach to the office. His tweets demonstrate his physical activity.

Eh? Which bit is politically incorrect?

lern_too_spel · on March 19, 2023

Saying that youthfulness is necessary to win the presidency, then somehow fitting Trump into that over Washington, who rode into battle as president.

astrange · on March 12, 2023

If you care about that write some unit tests. I'm sure you'll be very proud of yourself for stopping censorship every time you see isModelRacist() in green.

skybrian · on March 12, 2023

It seems like running it on an A100 in a datacenter would be better, though? Unless you think cloud providers are logging the outputs of programs that their customers run themselves.

chaxor · on March 13, 2023

Of course they are... The main reason "the cloud" exists is to log everything about it's users in every capacity possible. That's one reason they "provide it so cheap" (although now they have increased the cost so much it'sfar more expensive than self hosting. So you lose majorly in every way by not self hosting.

superkuh · on March 12, 2023

openai is expensive (ie, ~$25/mo for a gpt3 davinci IRC bot in a realtively small channel that only gets used heavily a few hours a day) and censored. And I'm not just talking won't respond to controversial things. Even innocuous topics are blocked. If you try to use gpt3.5-turbo for 10x less cost it is so bad with censoring itself that it can't even pass a turing test. Plus there's the whole data collection and privacy issue.

I just wish these weren't all articles about how to run it on proprietary mac setups. I'm still waiting for the guides on how to run it on a real PC.

zirgs · on March 12, 2023

https://github.com/oobabooga/text-generation-webui/wiki/LLaM... Here you go. You'll need a Nvidia GPU with at least 8 GB VRAM though.

mark_l_watson · on March 12, 2023

Thanks! I have a Linux laptop with 16G ram and a 10G NVidai 1070, so I might be good to go.

goldenCeasar · on March 13, 2023

Any examples on running on multiple GPUs?

lolinder · on March 12, 2023

The actual repo's instructions work perfectly without modification under Linux and WSL2:

https://github.com/ggerganov/llama.cpp

superkuh · on March 13, 2023

Thanks! I managed to get this running on CPU/system RAM on Debian 11 pretty easily.

behnamoh · on March 12, 2023

> I'm still waiting for the guides on how to run it on a real PC.

Mac __is__ a real PC.

miloignis · on March 12, 2023

The exact steps that work on a Mac should work on x64 Linux, since the addition of AVX2 support! (Source - I did it last night)

seydor · on March 12, 2023

I wish to keep all my chat logs , forever. This will be my alter ego that will even survive me. It must be private and not on someone else's computer.

But more importantly i want it uncensored. These tools are useful for deep conversation, which no longer exists online since many years ago

leobg · on March 12, 2023

In the 1970s, people moved to Ashrams in India to lose their ego. In the 2020s, people are anxious for AI to conserve it beyond death. Quite a generational pendulum swing… :)

seydor · on March 12, 2023

They took their notebooks with them. That's why we need private models

notfed · on March 12, 2023

To summarize other answers: (1) free (2) private (3) censorship/ethics-free (4) customizable (5) doesn't require Internet

cocktailpeanut · on March 12, 2023

It's free. there's extremely cheap, and there's free. no matter how extremely cheap something is, "free" is on a completely different level and gives us a new assumption that enables a lot of things that are not possible when each request is paid (no matter how cheap it is)

Tepix · on March 13, 2023

You do have to pay for electricity which can be significant when you have multiple GPUs

Der_Einzige · on March 12, 2023

ChatGPT doesn't give you the full vocabulary probability distribution, while locally running does. You need the full probability distribution to do things like constrained text generation, e.g. like this: https://paperswithcode.com/paper/most-language-models-can-be...

chaxor · on March 13, 2023

The cloud is actually inferior - It costs more over the long run, you can't see the probabilities or internals of the models, you can't change anything, and you have to give them all of your personal data that they log every second that your logged in (and probably when you're not). Running standard inference on GPUs for these models typically runs ~800$/month if you're actually using them often, which is much more than just running it on your own computer. If you need it away from the location just use a VPN. I don't understand the unnecessary use of 'the cloud' - especially in a supposedly "tech" forum - other than as a great triumph of marketing.

bt1a · on March 12, 2023

Inferior? "Cloud-based models"?

Not being reliant on a single entity is nice. I will accept not being on the bleeding edge of proprietary models and slower runs for the privacy and reliability of local execution.

delusional · on March 12, 2023

For me I think it's exciting in a couple of different ways. Most importantly, it's just way more hackable than these giant frameworks. It's pretty cool to be able to read all the code, down to first principles, to understand how computationally simple this stuff really is.

holoduke · on March 12, 2023

I am currently paying thousands per month for translations. (Billions of words per month) if we only could have a way to run a chatgpt quality like system localy, we could save a lot of money. I am really impressed by the translation quality of these late ai models.

rollinDyno · on March 12, 2023

I've been commuting for about 45 minutes on the subway and I sometimes try to get work done in there. It'd be useful to be able to get answers while offline.

smoldesu · on March 12, 2023

I mean, after SVB caved-in I'm sure a lot of VC-backed App Store devs were looking for something "magical" to lift their moods. Local LLMs are nothing new (even on ARM) but make an Apple Silicon-specific writeup and half the site will drop what they're doing to discuss it.