In search for a new self-hosted LLM

Tanka@lemmy.ml · edit-2 20 days ago

In search for a new self-hosted LLM

Jozzo@lemmy.world · 20 days ago

I find Qwen3.5 is the best at toolcalling and agent use, otherwise Gemma4 is a very solid all-rounder and it should be the first you try. Tbh gpt-oss is still good to this day, are you running into any problems w it?

Tanka@lemmy.ml · 20 days ago

No problems per se. I just thought that I had not checked for an update for a longer time.

jacksilver@lemmy.world · 19 days ago

You’re probably aware, but updating the model periodically is probably a good idea just because things do change overtime.

A model from two years ago was trained on data from at least two years ago. Meaning any technology, code, world event changes wouldn’t be reflected in the model.

zorflieg@lemmy.world · 19 days ago

Gemma4 e4b quant8 will fit in 12gb and is good

carzian@lemmy.ml · 20 days ago

I’m in the same boat. You’ll get better responses if you post your machine specs. I

Matt@lemmy.ml · 19 days ago

Qwen is pretty good. Also try LFM models.

cron@feddit.org · 20 days ago

The latest open weights model from google might be a good fit for you. The 26B model works pretty well on my machine, though the performance isn’t great (6 tokens per second, CPU only).

SuspciousCarrot78@lemmy.world · edit-2 19 days ago

What sort of coding and what sort of automation tasks? The latter is an easier ticket to fill than the former, though I might have an idea for you on that end if coding is a must

jaschen306@sh.itjust.works · 20 days ago

I’m running gemma4 26b MOE for most of my agent calls. I use glm5:cloud for my development agent because 26b struggles when the context windows gets too big.

sompreno@lemmy.zip · 20 days ago

What are your computer specs?

Tanka@lemmy.ml · 20 days ago

I did just update my post with the specs. Maybe it takes a while to federate?

sompreno@lemmy.zip · 20 days ago

I must have not refreshed ignore my comment

Evotech@lemmy.world · 18 days ago

I’d use some Chinese model. Qwen3.5 Claude 4.6 distilled ablitirated is what I use

theunknownmuncher@lemmy.world · 20 days ago

How much VRAM?