Try this, it's for running llms that won't fit in the gpu: https://github.com/FM... | Hacker News

HN2new | past | comments | ask | show | jobs | submit

		version_five on March 3, 2023 \| parent \| context \| favorite \| on: Facebook LLAMA is being openly distributed via tor... Try this, it's for running llms that won't fit in the gpu: https://github.com/FMInference/FlexGen

gpm on March 3, 2023 [–]

Currently that looks like it only supports facebook's opt and galactica models. Though they do appear to plan to add support for more models.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact