Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

I clicked expecting a single full multimodal LLM made by merging multiple existing models into one like the title suggests (which sounds very interesting), and I found... a library which is an LLM router/calls a bunch of LLM web APIs and exposes that under a unified/easy to use interface?

With all due respect, sorry, but this title is very misleading. I'd expect "build an LLM" to mean, well, actually building an LLM, and while it's a very nice library it's definitely not what the title suggests.



You know - the word "multimodal" i think is being used badly here. Its Multi-Model - not Multimodal - which certainly suggests a completeley different thing


It's a framework that uses the best part of each LLM, e.g. multimodal support from gemini with tool calling from gpt-4o and reasoning from o3-mini by chaining them dynamically. From a user perspective, there is no model selection or routing, just write the prompt or upload a file and it works so it feels like you're working with a single LLM but under the hood it does all this work to get you the best output :) Sorry if you felt it's misleading but I hope you give it a shot!


The problem with that phrasing is that there is actual model merging, where you merge the weights. So people reading the title might (and apparently do) expect that, less so an LLM router.


Makes sense but the problem is that you're using words that already have specific meanings in the space, all related to creating one model with multiple functionalities. Merging meaning merging models into one model. Multi modal meaning one llm that handles multiple modes. The term you want is probably agent or framework or chain or something. Basically, what you describe is when it feels like you're only working with one model. What your title says is when you engineer specifically actually only one model, which is a distinct technical challenge.


I 100% agree, this simulates a multimodal input and automatically handles the rest along with model selection by using a variation of techniques. It doesn't do this natively on the model level


You are still not getting it. The use of the word multimodal does nothing good for your software. It is an LLM router. I get it that your software does support some multimodal LLMs, but that is incidental.

Secondly, the use of the word "merging" is also grossly misleading. You are not merging LLMs, only routing requests.


And a similar product already exists, Langdock




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: