HN2new | past | comments | ask | show | jobs | submit | mirekrusin's commentslogin

Kind of ironic as claude code keeps showing this "get $50 bucks for referral" when coding on $200/mo plan, so fucking annoying. Hypocrites.

IMO an optional referral bonus is less invasive than inline ads that could potentially change the quality of my response.

If we invented Excel's transpose in 2026, we'd put ads on it. Hey, someone has to pay for all that innovation!

Yes, but you still don't want to see crap like that nagging you every time you use the tool, especially when paying $200/mo for it.

ps. I wish there was a way to make this progress bar blink circle like in other places without this "Nagging...", "Flipping burgers...", "Fishing..." stupid nonsense and unaligned logo-like animation - it's infantile, poor taste, distracting and annoying. It was fun for first 5 minutes, it's not anymore, it's shitty now, make it stop, please.


All models have multimodality now, it's not just text, in that sense they are not "just linguistic".

Regarding conscious vs non-conscious processes:

Inference is actually non-conscious process because nothing is observed by the model.

Auto regression is conscious process because model observes its own output, ie it has self-referential access.

Ie models use both and early/mid layers perform highly abstracted non-conscious processes.


The multimodality of most current popular models is quite limited (mostly text is used to improve capacity in vision tasks, but the reverse is not true, except in some special cases). I made this point below at https://hackernews.hn/item?id=46939091

Otherwise, I don't understand the way you are using "conscious" and "unconscious" here.

My main point about conscious reasoning is that when we introspect to try to understand our thinking, we tend to see e.g. linguistic, imagistic, tactile, and various sensory processes / representations. Some people focus only on the linguistic parts and downplay e.g. imagery ("wordcels vs. shape rotators meme"), but in either case, it is a common mistake to think the most important parts of thinking must always necessarily be (1) linguistic, (2) are clearly related to what appears during introspection.


All modern models are processing images internally within its own neural network, they don't delegate it to some other/ocr model. Image data flows through the same paths as text, what do you mean by "quite limited" here?

Your first comment was refering to unconscious, now you don't mention it.

Regarding "conscious and linguistic" which you seem to be touching on now, taking aside multimodality - text itself is way richer for llms than for humans. Trivial example may be ie. mermaid diagram which describes some complex topology, svg which describes some complex vector graphic or complex program or web application - all are textual but to understand and create them model must operate in non linguistic domains.

Even pure text-to-text models have ability to operate in other than linguistic domains, but they are not text-to-text only, they can ingest images directly as well.


I was obviously talking about conscious and unconscious processes in humans, you are attempting to transport these concepts to LLMs, which is not philosophically sound or coherent, generally.

Everything you said about how data flows in these multimodal models is not true in general (see https://huggingface.co/blog/vlms-2025), and unless you happen to work for OpenAI or other frontier AI companies, you don't know for sure how they are corralling data either.

Companies will of course engage in marketing and claim e.g. ChatGPT is a single "model", but, architecturally and in practice, this at least is known not to be accurate. The modalities and backbones in general remain quite separate, both architecturally and in terms of pre-training approaches. You are talking at a high level of abstraction that suggests education from blog posts by non-experts: actually read papers on how the architectures of these multimodal models are actually trained, developed, and connected, and you'll see the multi-modality is still very limited.

Also, and most importantly, the integration of modalities is primarily of the form:

    use (single) image annotations to improve image description, processing, and generation, i.e. "linking words to single images"
and not of the form

    use the implied spatial logic and relations from series of images and/or video to inform and improve linguistic outputs
I.e. most multimodal work is using linguistic models to represent or describe images linguistically, in the hope that the linguistic parts do the majority of the thinking and processing, but there is not much work using the image or video representations to do thinking, i.e. you "convert away" from most modalities into language, do work with token representations, and then maybe go back to images.

But there isn't much work on working with visuospatial world models or representations for the actual work (though there is some very cutting edge work here, e.g. Sam-3D https://ai.meta.com/blog/sam-3d/, and V-JEPA-2 https://ai.meta.com/research/vjepa/). But precisely because this stuff is cutting edge, even from frontier AI companies, it is likely most of the LLM stuff you see is largely driven by stuff learned from language, and not from images or other modalities. So LLMs are indeed still mostly constrained by their linguistic core.


Not only that, every optimization gain makes it more attractive and creates even more demand, ie. effective energy usage will not decrease or stay the same - it will increase.

It's like with electric cars - if you make them more efficient, it doesn't mean less electricity will be used but more as it'll become more attractive, more people will switch to electric cars.


In local setup you still usually want to split machine that runs inference from client that uses it, there are often non trivial resources used like chromium, compilation, databases etc involved that you don’t want to pollute inference machine with.

Anthropic settled with authors of stolen work for $1.5b, this case is closed, isn't it?

Its not approved yet I think.

"OpenAI"

What are you talking about? Qwen3-Coder-Next supports 256k context. Did you wanted to say that you don't have enough memory to run it locally yourself?

Yes!

I tried to go as far as 32k on the context window but beyond that it won't be usable on my laptop (Ryzen AI 365, 32gb RAM and 6gb of VRAM)


You need minimum ie. 2x 24G GPUs for this model (you need 46GB minimum).

Exactly, many people seem to not understand that frontmatter’s description field needs to be longer „when?” Instead of shorter „what” - this is the only entry-point into the skill.

Who cares? If you don't like it, you can fine tune.

I think a lot of people care. Most decidedely not you.

I think people care about open weights, so they can use it locally, including fine tuning like unalignment.

There are of course people that when you give them something that did cost millions of dollars to build for free will complain and share with the world what exactly they're entitled to.


is the censor after post training or is it applied at the data set

It takes download time + 1 minute to test speed yourself, you can try different quants, it's hard to write down a table because it depends on your system ie. ram clock etc. if you go out of gpu.

I guess it would make sense to have something like max context size/quants that fit fully on common configs with gpus, dual gpus, unified ram on mac etc.


Testing speed is easy yes, I'm mostly wondering about the quality difference between Q6 vs Q8_K_XL for example.

I haven't done benchmarking yet (plan to do them), but it should be similar to our post on DeepSeek-V3.1 Dynamic GGUFs: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: