HN2new | past | comments | ask | show | jobs | submitlogin

Try this, it's for running llms that won't fit in the gpu: https://github.com/FMInference/FlexGen


Currently that looks like it only supports facebook's opt and galactica models. Though they do appear to plan to add support for more models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: