Ask HN: Is anyone doing anything cool with tiny language models?

Evidlo · 2025-01-22T00:00:46 1737504046

I have ollama responding to SMS spam texts. I told it to feign interest in whatever the spammer is selling/buying. Each number gets its own persona, like a millennial gymbro or 19th century British gentleman.

http://files.widloski.com/image10%20(1).png

http://files.widloski.com/image11.png

celestialcheese · 2025-01-22T00:31:20 1737505880

Given the source, I'm skeptical it's not just a troll, but found this explanation [0] plausible as to why those vague spam text exists. If true, this trolling helps the spammers warm those phone numbers up.

0 - https://x.com/nikitabier/status/1867029883387580571

stogot · 2025-01-22T01:16:20 1737508580

Why does STOP work here?

inerte · 2025-01-22T01:22:32 1737508952

Carriers and SMS service providers (like Twillio) obey that, no matter what service is behind.

There are stories of people replying STOP to spam, then never getting a legit SMS because the number was re-used by another service. That's because it's being blocked between the spammer and the phone.

celestialcheese · 2025-01-22T01:21:39 1737508899

https://x.com/nikitabier/status/1867069169256308766

Again, no clue if this is true, but it seems plausible.

RVuRnvbM2e · 2025-01-22T00:04:18 1737504258

This is fantastic. How have your hooked up a mobile number to the llm?

Evidlo · 2025-01-22T00:15:11 1737504911

Android app that forwards to a Python service on remote workstation over MQTT. I can make a Show HN if people are interested.

dkga · 2025-01-22T00:59:54 1737507594

Yes, I'd be interested in that!

deadbabe · 2025-01-22T00:31:34 1737505894

I’d love to see that. Could you simulate iMessage?

great_psy · 2025-01-22T01:33:56 1737509636

Yes it’s possible, but it’s not something you can easily scale.

I had a similar project a few years back that used OSX automations and Shortcuts and Python to send a message everyday to a friend. It required you to be signed in to iMessage on your MacBook.

Than was a send operation, the reading of replies is not something I implemented, but I know there is a file somewhere that holds a history of your recent iMessages. So you would have to parse it on file update and that should give you the read operation so you can have a conversation.

Very doable in a few hours unless something dramatic changed with how the messages apps works within the last few years.

Evidlo · 2025-01-22T00:55:05 1737507305

If you mean hook this into iMessage, I don't know. I'm willing to bet it's way harder though because Apple

spiritplumber · 2025-01-22T00:11:37 1737504697

For something similar with FB chat, I use Selenium and run it on the same box that the llm is running on. Using multiple personalities is really cool though. I should update mine likewise!

zx8080 · 2025-01-22T00:12:35 1737504755

Cool! Do you consider the risk of unintentional (and until some moment, an unknown) subscription to some paid SMS service and how do you mitigate it?

Evidlo · 2025-01-22T00:18:24 1737505104

I have to whitelist a conversation before the LLM can respond.

blackeyeblitzar · 2025-01-22T01:46:21 1737510381

You realize this is going to cause carriers to allow the number to send more spam, because it looks like engagement. The best thing to do is to report the offending message to 7726 (SPAM) so the carrier can take action. You can also file complaints at the FTC and FCC websites, but that takes a bit more effort.

thecosmicfrog · 2025-01-22T00:41:44 1737506504

Please tell me you have a blog/archive of these somewhere. This was such a joy to read!

antonok · 2025-01-21T23:57:18 1737503838

I've been using Llama models to identify cookie notices on websites, for the purpose of adding filter rules to block them in EasyList Cookie. Otherwise, this is normally done by, essentially, manual volunteer reporting.

Most cookie notices turn out to be pretty similar, HTML/CSS-wise, and then you can grab their `innerText` and filter out false positives with a small LLM. I've found the 3B models have decent performance on this task, given enough prompt engineering. They do fall apart slightly around edge cases like less common languages or combined cookie notice + age restriction banners. 7B has a negligible false-positive rate without much extra cost. Either way these things are really fast and it's amazing to see reports streaming in during a crawl with no human effort required.

Code is at https://github.com/brave/cookiemonster. You can see the prompt at https://github.com/brave/cookiemonster/blob/main/src/text-cl....

bazmattaz · 2025-01-22T00:03:08 1737504188

This is so cool thanks for sharing. I can imagine it’s not technically possible (yet?) but it would be cool if this could simply be run as a browser extension rather than running a docker container

antonok · 2025-01-22T00:05:48 1737504348

I did actually make a rough proof-of-concept of this! One of my long-term visions is to have it running natively in-browser, and able to automatically fix site issues caused by adblocking whenever they happen.

The PoC is a bit outdated but it's here: https://github.com/brave/cookiemonster/tree/webext

throwup238 · 2025-01-22T03:53:55 1737518035

It should be possible using native messaging [1] which can call out to an external binary. The 1password extensions use that to communicate with the password manager binary.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...

binarysneaker · 2025-01-22T00:03:00 1737504180

Maybe it could also send automated petitions to the EU to undo cookie consent legislation, and reverse some of the enshitification.

sebastiennight · 2025-01-22T04:08:17 1737518897

To me this take is like smokers complaining that the evil government is forcing the good tobacco companies to degrade the experience by adding pictures of cancer patients on cigarette packs.

antonok · 2025-01-22T00:09:07 1737504547

Ha, I'm not sure the EU is prepared to handle the deluge of petitions that would ensue.

On a more serious note, this must be the first time we can quantitatively measure the impact of cookie consent legislation across the web, so maybe there's something to be explored there.

K0balt · 2025-01-22T00:43:30 1737506610

I think there is real potential here, for smart browsing. Have the llm get the page, replace all the ads with kittens, find non-paywall versions if possible and needed, spoof fingerprint data, detect and highlight AI generated drivel, etc. The site would have no way of knowing that it wasn’t touching eyeballs. We might be able to rake back a bit of the web this way.

antonok · 2025-01-22T00:58:30 1737507510

You probably wouldn't want to run this in real-time on every site as it'll significantly increase the load on your browser, but as long as it's possible to generate adblock filter rules, the fixes can scale to a pretty large audience.

K0balt · 2025-01-22T02:30:26 1737513026

I was thinking running it in my home lab server as a proxy, but yeah, scaling it to the browser would require some pretty strong hardware. Still, maybe in a couple of years it could be mainstream.

behohippy · 2025-01-21T20:57:37 1737493057

I have a mini PC with an n100 CPU connected to a small 7" monitor sitting on my desk, under the regular PC. I have llama 3b (q4) generating endless stories in different genres and styles. It's fun to glance over at it and read whatever it's in the middle of making. I gave llama.cpp one CPU core and it generates slow enough to just read at a normal pace, and the CPU fans don't go nuts. Totally not productive or really useful but I like it.

ipython · 2025-01-21T22:46:32 1737499592

That's neat. I just tried something similar:

    FORTUNE=$(fortune) && echo $FORTUNE && echo "Convert the following output of the Unix `fortune` command into a small screenplay in the style of Shakespeare: \n\n $FORTUNE" | ollama run phi4

Uehreka · 2025-01-21T21:20:48 1737494448

Do you find that it actually generates varied and diverse stories? Or does it just fall into the same 3 grooves?

Last week I tried to get an LLM (one of the recent Llama models running through Groq, it was 70B I believe) to produce randomly generated prompts in a variety of styles and it kept producing cyberpunk scifi stuff. When I told it to stop doing cyberpunk scifi stuff it went completely to wild west.

o11c · 2025-01-21T21:35:42 1737495342

You should not ever expect an LLM to actually do what you want without handholding, and randomness in particular is one of the places it fails badly. This is probably fundamental.

That said, this is also not helped by the fact that all of the default interfaces lack many essential features, so you have to build the interface yourself. Neither "clear the context on every attempt" nor "reuse the context repeatedly" will give good results, but having one context producing just one-line summaries, then fresh contexts expanding each one will do slightly less badly.

(If you actually want the LLM to do something useful, there are many more things that need to be added beyond this)

dotancohen · 2025-01-21T22:50:29 1737499829

Sounds to me like you might want to reduce the Top P - that will prevent the really unlikely next tokens from ever being selected, while still providing nice randomness in the remaining next tokens so you continue to get diverse stories.

coder543 · 2025-01-22T02:33:13 1737513193

Someone mentioned generating millions of (very short) stories with an LLM a few weeks ago: https://news.ycombinator.com/item?id=42577644

They linked to an interactive explorer that nicely shows the diversity of the dataset, and the HF repo links to the GitHub repo that has the code that generated the stories: https://github.com/lennart-finke/simple_stories_generate

So, it seems there are ways to get varied stories.

janalsncm · 2025-01-21T22:56:07 1737500167

Generate a list of 5000 possible topics you’d like it to talk about. Randomly pick one and inject that into your prompt.

Dansvidania · 2025-01-21T21:05:19 1737493519

this sounds pretty cool, do you have any video/media of it?

droideqa · 2025-01-22T01:53:47 1737510827

That's awesome!

keeganpoppen · 2025-01-21T22:44:19 1737499459

oh wow that is actually such a brilliant little use case-- really cuts to the core of the real "magic" of ai: that it can just keep running continuously. it never gets tired, and never gets tired of thinking.

bithavoc · 2025-01-21T21:11:56 1737493916

this is so cool, any chance you post a video?

nozzlegear · 2025-01-21T23:30:18 1737502218

I have a small fish script I use to prompt a model to generate three commit messages based off of my current git diff. I'm still playing around with which model comes up with the best messages, but usually I only use it to give me some ideas when my brain isn't working. All the models accomplish that task pretty well.

Here's the script: https://github.com/nozzlegear/dotfiles/blob/master/fish-func...

And for this change [1] it generated these messages:

    1. `fix: change from printf to echo for handling git diff input`
    
    2. `refactor: update codeblock syntax in commit message generator`
    
    3. `style: improve readability by adjusting prompt formatting`

[1] https://github.com/nozzlegear/dotfiles/commit/0db65054524d0d...

mentos · 2025-01-22T03:53:17 1737517997

Awesome need to make one for naming variables too haha

flippyhead · 2025-01-21T22:12:50 1737497570

I have a tiny device that listens to conversations between two people or more and constantly tries to declare a "winner"

mkaic · 2025-01-22T00:27:08 1737505628

This reminds me of the antics of streamer DougDoug, who often uses LLM APIs to live-summarize, analyze, or interact with his (often multi-thousand-strong) Twitch chat. Most recently I saw him do a GeoGuessr stream where he had ChatGPT assume the role of a detective who must comb through the thousands of chat messages for clues about where the chat thinks the location is, then synthesizes the clamor into a final guess. Aside from constantly being trolled by people spamming nothing but "Kyoto, Japan" in chat, it occasionaly demonstrated a pretty effective incarnation of "the wisdom of the crowd" and was strikingly accurate at times.

eddd-ddde · 2025-01-21T23:41:02 1737502862

I love that there's not even a vague idea of the winner "metric" in your explanation. Like it's just, _the_ winner.

oa335 · 2025-01-21T22:18:10 1737497890

This made me actually laugh out loud. Can you share more details on hardware and models used?

jjcm · 2025-01-21T22:37:56 1737499076

Are you raising a funding round? I'm bought in. This is hilarious.

econ · 2025-01-21T22:34:50 1737498890

This is a product I want

hn8726 · 2025-01-21T23:17:31 1737501451

What approach/stack would you recommend for listening to an ongoing conversation, transcribing it and passing through llm? I had some use cases in mind but I'm not very familiar with AI frameworks and tools

prakashn27 · 2025-01-22T04:14:44 1737519284

wifey always wins. ;)

pseudosavant · 2025-01-21T22:17:26 1737497846

I'd love to hear more about the hardware behind this project. I've had concepts for tech requiring a mic on me at all times for various reasons. Always tricky to have enough power in a reasonable DIY form factor.

nejsjsjsbsb · 2025-01-22T02:28:43 1737512923

All computation on device?

amelius · 2025-01-21T22:36:52 1737499012

You can use the model to generate winning speeches also.

RhysU · 2025-01-21T20:37:07 1737491827

"Comedy Writing With Small Generative Models" by Jamie Brew (Strange Loop 2023)

https://m.youtube.com/watch?v=M2o4f_2L0No

Spend the 45 minutes watching this talk. It is a delight. If you are unsure, wait until the speaker picks up the guitar.

100k · 2025-01-21T20:40:26 1737492026

Seconded! This was my favorite talk at Strange Loop (including my own).

azhenley · 2025-01-21T20:50:42 1737492642

Microsoft published a paper on their FLAME model (60M parameters) for Excel formula repair/completion which outperformed much larger models (>100B parameters).

https://arxiv.org/abs/2301.13779

coder543 · 2025-01-22T03:31:10 1737516670

That paper is from over a year ago, and it compared against codex-davinci... which was basically GPT-3, from what I understand. Saying >100B makes it sound a lot more impressive than it is in today's context... 100B models today are a lot more capable. The researchers also compared against a couple of other ancient(/irrelevant today), small models that don't give me much insight.

FLAME seems like a fun little model, and 60M is truly tiny compared to other LLMs, but I have no idea how good it is in today's context, and it doesn't seem like they ever released it.

andai · 2025-01-21T21:30:28 1737495028

This is wild. They claim it was trained exclusively on Excel formulas, but then they mention retrieval? Is it understanding the connection between English and formulas? Or am I misunderstanding retrieval in this context?

Edit: No, the retrieval is Formula-Formula, the model (nor I believe tokenizer) does not handle English.

3abiton · 2025-01-21T22:05:43 1737497143

But I feel we're going back full circle. These small models are not generalist, thus not really LLMs at least in terms of objective. Recently there has been a rise of "specialized" models that provide lots of values, but that's not why we were sold on LLMs.

Suppafly · 2025-01-21T23:01:34 1737500494

Specialized models work much better still for most stuff. Really we need an LLM to understand the input and then hand it off to a specialized model that actually provides good results.

colechristensen · 2025-01-21T22:16:06 1737497766

But that's the thing, I don't need my ML model to be able to write me a sonnet about the history of beets, especially if I want to run it at home for specific tasks like as a programming assistant.

I'm fine with and prefer specialist models in most cases.

zeroCalories · 2025-01-21T23:44:10 1737503050

I would love a model that knows SQL really well so I don't need to remember all the small details of the language. Beyond that, I don't see why the transformer architecture can't be applied to any problem that needs to predict sequences.

dr_kiszonka · 2025-01-22T01:02:48 1737507768

The trick is to find such problems with enough training data and some market potential. I am terrible at it.

janalsncm · 2025-01-21T23:12:03 1737501123

I think playing word games about what really counts as an LLM is a losing battle. It has become a marketing term, mostly. It’s better to have a functionalist point of view of “what can this thing do”.

barrenko · 2025-01-21T21:14:18 1737494058

This is really cool. Is this already in Excel?

simonjgreen · 2025-01-21T22:33:53 1737498833

Micro Wake Word is a library and set of on device models for ESPs to wake on a spoken wake word. https://github.com/kahrendt/microWakeWord

Recently deployed in Home Assistants fully local capable Alexa replacement. https://www.home-assistant.io/voice_control/about_wake_word/

jwitthuhn · 2025-01-22T01:45:27 1737510327

I've made a tiny ~1m parameter model that can generate random Magic the Gathering cards that is largely based on Karpathy's nanogpt with a few more features added on top.

I don't have a pre-trained model to share but you can make one yourself from the git repo, assuming you have an apple silicon mac.

https://github.com/jlwitthuhn/TCGGPT

sidravi1 · 2025-01-22T03:04:47 1737515087

We fine-tuned a Gemma 2B to identify urgent messages sent by new and expecting mothers on a government-run maternal health helpline.

https://idinsight.github.io/tech-blog/blog/enhancing_materna...

proxygeek · 2025-01-22T04:19:11 1737519551

Such a fun thread but this is the kind of applications that perk up my attention!

Very cool!

deet · 2025-01-21T21:28:44 1737494924

We (avy.ai) are using models in that range to analyze computer activity on-device, in a privacy sensitive way, to help knowledge workers as they go about their day.

The local models do things ranging from cleaning up OCR, to summarizing meetings, to estimating the user's current goals and activity, to predicting search terms, to predicting queries and actions that, if run, would help the user accomplish their current task.

The capabilities of these tiny models have really surged recently. Even small vision models are becoming useful, especially if fine tuned.

mettamage · 2025-01-21T20:17:43 1737490663

I simply use it to de-anonymize code that I typed in via Claude

Maybe should write a plugin for it (open source):

1. Put in all your work related questions in the plugin, an LLM will make it as an abstract question for you to preview and send it

2. And then get the answer with all the data back

E.g. df[“cookie_company_name”] becomes df[“a”] and back

sitkack · 2025-01-21T22:09:06 1737497346

So you are using a local small model to remove identifying information and make the question generic, which is then sent to a larger model? Is that understanding correct?

I think this would have some additional benefits of not confusing the larger model with facts it doesn't need to know about. My erasing information, you can allow its attention heads to focus on the pieces that matter.

Requires further study.

sundarurfriend · 2025-01-22T03:51:12 1737517872

You're using it to anonymize your code, not de-anonymize someone's code. I was confused by your comment until I read the replies and realized that's what you meant to say.

politelemon · 2025-01-21T20:24:08 1737491048

Could you recommend a tiny language model I could try out locally?

mettamage · 2025-01-21T20:40:29 1737492029

Llama 3.2 has about 3.2b parameters. I have to admit, I use bigger ones like phi-4 (14.7b) and Llama 3.3 (70.6b) but I think Llama 3.2 could do de-anonimization and anonimization of code

RicoElectrico · 2025-01-21T21:21:27 1737494487

Llama 3.2 punches way above its weight. For general "language manipulation" tasks it's good enough - and it can be used on a CPU with acceptable speed.

seunosewa · 2025-01-21T22:17:05 1737497825

How many tokens/s?

OxfordOutlander · 2025-01-21T20:52:48 1737492768

+1 this idea. I do the same. Just do it locally using ollama, also using 3.2 3b

sauwan · 2025-01-21T22:19:29 1737497969

Are you using the model to create a key-value pair to find/replace and then reverse to reanonymize, or are you using its outputs directly? If the latter, is it fast enough and reliable enough?

gpm · 2025-01-22T03:34:17 1737516857

I made a shell alias to translate things from French to English, does that count?

    function trans
        llm "Translate \"$argv\" from French to English please"
    end

Llama 3.2:3b is a fine French-English dictionary IMHO.

ata_aman · 2025-01-21T23:08:27 1737500907

I have it running on a Raspberry Pi 5 for offline chat and RAG. I wrote this open-source code for it: https://github.com/persys-ai/persys

It also does RAG on apps there, like the music player, contacts app and to-do app. I can ask it to recommend similar artists to listen to based on my music library for example or ask it to quiz me on my PDF papers.

nejsjsjsbsb · 2025-01-22T02:34:41 1737513281

Does https://github.com/persys-ai/persys-server run on the rpi?

Is that design 3d printable? Or is that for paid users only.

ata_aman · 2025-01-22T03:41:58 1737517318

I can publish it no problem. I’ll create a new repo with instructions for the hardware with CAD files.

Designing a new one for the NVIDIA Orin Nano Super so it might take a few days.

guywithahat · 2025-01-22T03:54:12 1737518052

I've been working on a self-hosted, low-latency service for small LLM's. It's basically exactly what I would have wanted when I started my previous startup. The goal is for real time applications, where even the network time to access a fast LLM like groq is an issue.

I haven't benchmarked it yet but I'd be happy to hear opinions on it. It's written in C++ (specifically not python), and is designed to be a self-contained microservice based around llama.cpp.

https://github.com/thansen0/fastllm.cpp

psyklic · 2025-01-21T20:05:52 1737489952

JetBrains' local single-line autocomplete model is 0.1B (w/ 1536-token context, ~170 lines of code): https://blog.jetbrains.com/blog/2024/04/04/full-line-code-co...

For context, GPT-2-small is 0.124B params (w/ 1024-token context).

pseudosavant · 2025-01-21T22:22:40 1737498160

I wonder how big that model is in RAM/disk. I use LLMs for FFMPEG all the time, and I was thinking about training a model on just the FFMPEG CLI arguments. If it was small enough, it could be a package for FFMPEG. e.g. `ffmpeg llm "Convert this MP4 into the latest royalty-free codecs in an MKV."`

h0l0cube · 2025-01-21T23:10:49 1737501049

Please submit a blog post to HN when you're done. I'd be curious to know the most minimal LLM setup needed get consistently sane output for FFMPEG parameters.

jedbrooke · 2025-01-21T22:33:01 1737498781

the jetbrains models are about 70MB zipped on disk (one model per language)

binary132 · 2025-01-22T00:29:53 1737505793

That’s a great idea, but I feel like it might be hard to get it to be correct enough

maujim · 2025-01-21T23:35:25 1737502525

from a few days ago: https://news.ycombinator.com/item?id=42706637

smaddox · 2025-01-21T22:11:27 1737497487

You can train that size of a model on ~1 billion tokens in ~3 minutes on a rented 8xH100 80GB node (~$9/hr on Lambda Labs, RunPod io, etc.) using the NanoGPT speed run repo: https://github.com/KellerJordan/modded-nanogpt

For that short of a run, you'll spend more time waiting for the node to come up, downloading the dataset, and compiling the model, though.

WithinReason · 2025-01-21T20:46:58 1737492418

That size is on the edge of something you can train at home

vineyardmike · 2025-01-21T21:31:57 1737495117

If you have modern hardware, you can absolutely train that at home. Or very affordable on a cloud service.

I’ve seen a number of “DIY GPT-2” tutorials that target this sweet spot. You won’t get amazing results unless you want to leave a personal computer running for a number of hours/days and you have solid data to train on locally, but fine-tuning should be in the realm of normal hobbyists patience.

nottorp · 2025-01-21T21:59:18 1737496758

Hmm is there anything reasonably ready made* for this spot? Training and querying a llm locally on an existing codebase?

* I don't mind compiling it myself but i'd rather not write it.

Sohcahtoa82 · 2025-01-21T23:50:15 1737503415

Not even on the edge. That's something you could train on a 2 GB GPU.

The general guidance I've used is that to train a model, you need an amount of RAM (or VRAM) equal to 8x the number of parameters, so a 0.125B model would need 1 GB of RAM to train.

staticautomatic · 2025-01-21T23:04:42 1737500682

Is that why their tab completion is so bad now?

linsomniac · 2025-01-22T02:58:39 1737514719

I have this idea that a tiny LM would be good at canonicalizing entered real estate addresses. We currently buy a data set and software from Experian, but it feels like something an LM might be very good at. There are lots of weirdnesses in address entry that regexes have a hard time with. We know the bulk of addresses a user might be entering, unless it's a totally new property, so we should be able to train it on that.

JLCarveth · 2025-01-22T01:22:25 1737508945

I used a small (3b, I think) model plus tesseract.js to perform OCR on an image of a nutritional facts table and output structured JSON.

eb0la · 2025-01-21T21:04:36 1737493476

We're using small language models to detect prompt injection. Not too cool, but at least we can publish some AI-related stuff on the internet without a huge bill.

sitkack · 2025-01-21T22:10:18 1737497418

What kind of prompt injection attacks do you filter out? Have you tested with a prompt tuning framework?

A4ET8a8uTh0_v2 · 2025-01-21T21:18:32 1737494312

Kinda? All local so very much personal, non-business use. I made Ollama talk in a specific persona styles with the idea of speaking like Spider Jerusalem, when I feel like retaining some level of privacy by avoiding phrases I would normally use. Uncensored llama just rewrites my post with a specific persona's 'voice'. Works amusingly well for that purpose.

deivid · 2025-01-21T23:14:11 1737501251

Not sure it qualifies, but I've started building an Android app that wraps bergamot[0] (the firefox translation models) to have on-device translation without reliance on google.

Bergamot is already used inside firefox, but I wanted translation also outside the browser.

[0]: bergamot https://github.com/browsermt/bergamot-translator

cwmoore · 2025-01-21T23:37:05 1737502625

I'm playing with the idea of identifying logical fallacies stated by live broadcasters.

genewitch · 2025-01-22T01:33:21 1737509601

I have several rhetoric and logic books of the sort you might use for training or whatever, and one of my best friends got a doctorate in a tangential field, and may have materials and insights.

We actually just threw a relationship curative app online in 17 hours around Thanksgiving., so they "owe" me, as it were.

I'm one of those people that can do anything practical with tech and the like, but I have no imagination for it - so when someone mentions something that I think would be beneficial for my fellow humans I get this immense desire to at least cheer on if not ask to help.

JayStavis · 2025-01-22T04:07:25 1737518845

Automation to identify logical/rhetorical fallacies is a long held dream of mine, would love to follow along with this project if it picks up somehow

spiritplumber · 2025-01-22T00:17:40 1737505060

That's fantastic and I'd love to help

cwmoore · 2025-01-22T00:28:34 1737505714

So far not much beyond this list of targets to identify https://en.wikipedia.org/wiki/List_of_fallacies

petesergeant · 2025-01-22T02:19:08 1737512348

I'll be very positively impressed if you make this work; I spend all day every day for work trying to make more capable models perform basic reasoning, and often failing :-P

iamnotagenius · 2025-01-21T20:42:38 1737492158

No, but I use llama 3.2 1b and qwen2.5 1.5 as bash oneliner generator, always runnimg in console.

andai · 2025-01-21T21:30:59 1737495059

Could you elaborate?

XMasterrrr · 2025-01-21T22:39:18 1737499158

I think I know what he means. I use AI Chat. I load Qwen2.5-1.5B-Instruct with llama.cpp server, fully offloaded to the CPU, and then I config AI Chat to connect to the llama.cpp endpoint.

Checkout the demo they have below

https://github.com/sigoden/aichat#shell-assistant

XMasterrrr · 2025-01-21T22:39:43 1737499183

What's your workflow like? I use AI Chat. I load Qwen2.5-1.5B-Instruct with llama.cpp server, fully offloaded to the CPU, and then I config AI Chat to connect to the llama.cpp endpoint.

mritchie712 · 2025-01-21T21:51:22 1737496282

I used local LLMs via Ollama for generating H1's / marketing copy.

1. Create several different personas

2. Generate a ton of variation using a high temperature

3. Compare the variagtions head-to-head using the LLM to get a win / loss ratio

The best ones can be quite good.

0 - https://www.definite.app/blog/overkillm

UltraSane · 2025-01-22T02:43:20 1737513800

clever name!

jftuga · 2025-01-22T02:48:15 1737514095

I'm using ollama, llama3.2 3b, and python to shorten news article titles to 10 words or less. I have a 3 column web site with a list of news articles in the middle column. Some of the titles are too long for this format, but the shorter titles appear OK.

arionhardison · 2025-01-21T20:48:22 1737492502

I am, in a way by using EHR/EMR data for fine tuning so agents can query each other for medical records in a HIPPA compliant manner.

jmward01 · 2025-01-22T00:05:58 1737504358

I think I am. At least I think I'm building things that will enable much smaller models: https://github.com/jmward01/lmplay/wiki/Sacrificial-Training

spiritplumber · 2025-01-22T00:10:17 1737504617

My husband and me made a stock market analysis thing that gets it right about 55% of the time, so better than a coin toss. The problem is that it keeps making unethical suggestions, so we're not using it to trade stock. Does anyone have any idea what we can do with that?

dkga · 2025-01-22T01:03:36 1737507816

Suggestion: calculate the out-of-sample Sharpe ratio[0] of the suggestions over a reasonable period to gauge how good the model would actually perform in terms of return compared to risks. It is better than vanilla accuracy or related metrics. Source: I'm a financial economist.

[0]: https://en.wikipedia.org/wiki/Sharpe_ratio

spiritplumber · 2025-01-22T02:50:57 1737514257

thank you! that's exactly the sort of thing I don't know.

Etheryte · 2025-01-22T00:35:24 1737506124

Have you backtested this in times when markets were not constantly green? Nearly any strategy is good in the good times.

spiritplumber · 2025-01-22T02:50:29 1737514229

yep. the 55% is over a few years.

bobbygoodlatte · 2025-01-22T00:19:58 1737505198

I'm curious what sort of unethical suggestions it's coming up with haha

spiritplumber · 2025-01-22T02:50:14 1737514214

so far, mostly buying companies owned/ran by horrible people.

bongodongobob · 2025-01-22T01:33:55 1737509635

You can literally flip coins and get better than 50% success in a bull market. Just buy index funds and spend your time on something that isn't trying to beat entropy. You won't be able to.

spiritplumber · 2025-01-22T04:15:26 1737519326

INSUFFICIENT DATA FOR A MEANINGFUL ANSWER.

danbmil99 · 2025-01-22T00:06:17 1737504377

Using llama 3.2 as an interface to a robot. If you can get the latency down, it works wonderfully

mentos · 2025-01-22T03:56:41 1737518201

Would love to see this applied to a FPS bot in unreal engine.

codazoda · 2025-01-22T01:08:53 1737508133

I had an LLM create a playlist for me.

I’m tired of the bad playlists I get from algorithms, so I made a specific playlist with an Llama2 based on several songs I like. I started with 50, removed any I didn’t like, and added more to fill in the spaces. The small models were pretty good at this. Now I have a decent fixed playlist. It does get “tired” after a few weeks and I need to add more to it. I’ve never been able to do this myself with more than a dozen songs.

petesergeant · 2025-01-22T02:18:36 1737512316

Interesting! I've sadly found more capable models to really fail on music recommendations for me.

dh1011 · 2025-01-22T03:04:58 1737515098

I copied all the text from this post and used an LLM to generate a list of all the ideas. I do the same for other similar HN post .

lordswork · 2025-01-22T03:08:02 1737515282

well, what are the ideas?

HexDecOctBin · 2025-01-22T01:51:15 1737510675

Is there any experiments in a small models that does paraphrasing? I tried hsing some off-the-shelf models, but it didn't go well.

I was thinking of hooking them in RPGs with text-based dialogue, so that a character will say something slightly different every time you speak to them.

itskarad · 2025-01-22T01:34:00 1737509640

I'm using ollama for parsing and categorizing scraped jobs for a local job board dashboard I check everyday.

kianN · 2025-01-22T00:32:39 1737505959

I don’t know if this counts as tiny but I use llama 3B in prod for summarization (kinda).

Its effective context window is pretty small but I have a much more robust statistical model that handles thematic extraction. The llm is essentially just rewriting ~5-10 sentences into a single paragraph.

I’ve found the less you need the language model to actually do, the less the size/quality of the model actually matters.

thetrash · 2025-01-21T23:41:24 1737502884

I programmed my own version of Tic Tac Toe in Godot, using a Llama 3B as the AI opponent. Not for work flow, but figuring out how to beat it is entertaining during moments of boredom.

spiritplumber · 2025-01-22T00:17:01 1737505021

Number of players: zero

U.S. FIRST STRIKE WINNER: NONE

USSR FIRST STRIKE WINNER: NONE

NATO / WARSAW PACT WINNER: NONE

FAR EAST STRATEGY WINNER: NONE

US USSR ESCALATION WINNER: NONE

MIDDLE EAST WAR WINNER: NONE

USSR CHINA ATTACK WINNER: NONE

INDIA PAKISTAN WAR WINNER: NONE

MEDITERRANEAN WAR WINNER: NONE

HONGKONG VARIANT WINNER: NONE

Strange game. The only winning move is not to play

juancroldan · 2025-01-22T00:02:46 1737504166

I'm making an agent that takes decompiled code and tries to understand the methods and replace variables and function names one at a time.

kristopolous · 2025-01-21T22:55:09 1737500109

I'm working on using them for agentic voice commands of a limited scope.

My needs are narrow and limited but I want a bit of flexibility.

jothflee · 2025-01-22T00:31:42 1737505902

when i feel like casually listening to something, instead of netflix/hulu/whatever, i'll run a ~3b model (qwen 2.5 or llama 3.2) and generate and audio stream of water cooler office gossip. (when it is up, it runs here: https://water-cooler.jothflee.com).

some of the situations get pretty wild, for the office :)

jftuga · 2025-01-22T02:54:30 1737514470

What prompt are you using for this?

ignoramous · 2025-01-21T21:51:10 1737496270

We're prototyping a text firewall (for Android) with Gemma2 2B (which limits us to English), though DeepSeek's R1 variants now look pretty promising [0]: Depending on the content, we rewrite the text or quarantine it from your view. Of course this is easy (for English) in the sense that the core logic is all LLMs [1], but the integration points (on Android) are not so straight forward for anything other than SMS. [2]

A more difficult problem we forsee is to turn it into a real-time (online) firewall (for calls, for example).

[1] https://chat.deepseek.com/a/chat/s/d5aeeda1-fefe-4fc6-8c90-2...

[1] MediaPipe in particular makes it simple to prototype around Gemma2 on Android: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inf...

[2] Intend to open source it once we get it working for anything other than SMSes

Havoc · 2025-01-21T20:44:02 1737492242

Pretty sure they are mostly used as fine tuning targets, rather than as-is.

dcl · 2025-01-21T21:50:35 1737496235

But for what purposes?