More

mpmisko · on Aug 17, 2024

EBMs show up all over the place, apparently even your classifier is an EBM :) (https://arxiv.org/abs/1912.03263).

uoaei · on Aug 17, 2024

You can take many equivalent perspectives on learning systems, but mostly it reduces to "messing with denominators in Bayes' rule". This is no different.

EBMs today aren't used because first you have to fit the joint model, then you have to fix some inputs, then fit the other inputs in a second optimization step. That's just too much compute for today's workloads compared to feedforward NNs.

mpmisko · on April 29, 2024

GPT-based search engines usually use some sort of a database to retrieve context for the LLM to summarize first. This is what people refer to as RAG these days: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-ge....

Some of these GPT engines maintain their own vector DB to do semantic search, others are directly hooked into Bing / Google. So pubmedisearch.com would be one component of a GPT-based engine. We actually have a GPT-based engine here: https://medisearch.io/.

mpmisko · on April 28, 2024

Lots of annoying edge cases as you can imagine, nothing particularly glamorous.

mpmisko · on April 28, 2024

Done! Let me know if you have other feedback.

mpmisko · on April 27, 2024

Thanks! Looks quite relevant

mpmisko · on April 26, 2024

Training for multiple epochs is a bit like that :)

mpmisko · on April 26, 2024

We use pinecone and it is not ideal, looking at https://turbopuffer.com/ now. They look quite promising :)

yumraj · on April 27, 2024

Did you compare pinecone against pgvector with Postgres? Self hosted of course

cchance · on April 27, 2024

Isn't it funny how the best Choice somehow always comes back to Postgres in the end XD (for most)

yumraj · on April 27, 2024

Yes, that’s where I’m these days. I don’t even think of venturing outside of Postgres these days, except for say things like Redis etc. where there are mature and established options for specific use cases.

mpmisko · on April 28, 2024

Will definitely check pgvector, thanks for the pointer.

cchance · on April 27, 2024

What kinda dimensions did you keep it relatively low to keep costs down?

mpmisko · on April 26, 2024

1. We cover all the articles on PMC. The exact cost is hard to estimate because we did a lot of iterations.

2. We do weight those ... it is a lot of trial and error and you have to have good & exhaustive benchmarks.

mpmisko · on April 26, 2024

mpmisko · on April 26, 2024

Glad you like it! I did this as a mini-project within our startup MediSearch (https://medisearch.io/) & the search pipeline is custom tuned for the problem.