Why do people host LLMs at home when processing the same amount of data from the internet to train their LLM will never be even a little bit as efficient as sending a paid prompt to some high quality official model?
inb4 privacy concerns or a proof of concept
this is out of discussion, I want someone to prove his LLM can be as insightful and accurate as paid one. I don’t care about anything else than quality of generated answers
Poor internet connection/no internet at all, network latency too high for their needs, specific fine tuned LLM ?
Off course, main reason is privacy. My company host its own GPT4 chatbot, and forbid us to use public ones. But I suppose there are other legit use case to host its own LLM.
Afaik most people who seriously want to have a local LLM start of off a pre-trained one from the internet…?
Because you don’t train your self-hosted LLM.
As a result you only pay for the electricity of computing your tokens (your request), this can be especially reasonable if the same machine also does local game streaming and or transcoding, and thus already has the requirements to host a LLM.If you don’t have rather unreasonable means, your local LLM is just very much more limited in parameters (size), and will not be as good as other, much larger models.
Privacy, Ethics and personal interest usually are the largest drivers from what I can tell.
Heh, you shouldn’t be paying for LLMs. Gemini 2.5 Pro is free, and so are a bunch of great API models. ChatGPT kinda sucks these days (depending on the content).
I have technical reasons for running local models (instant cached responses, constrained grammar, logprob output, finetuning), and I can help you set that up if you want, but TBH I am not going into a long technical proof of why that’s advantageous unless you really want to try this all yourself.
What you ask for is impossible to prove, let alone test, without a rigorous definition of “insightful” or “accurate”.