← writing

Local vs cloud LLMs, mid-2026

I keep one frontier model in the cloud and one open model running on my Mac. Both earn their keep. Here is an honest snapshot of where local LLMs stand in June 2026, four real use cases, and why the question is not local vs cloud.

I keep one frontier model in the cloud and one open model running on my Mac. Both earn their keep. That is the whole post in one sentence, but if you are trying to decide what to run where in mid-2026, the interesting part is which jobs go to which model, and why.

This is the honest version. Not the hype version where local models have caught up, and not the cynical version where they are still toys. Somewhere between those two, which is where most useful tools live.

Where local LLMs are in June 2026

The open ecosystem has had a very good twelve months. Llama 4 landed with a serious mixture-of-experts architecture and weights you can actually run. Qwen 3 has become my default for anything multilingual or code-shaped. DeepSeek keeps shipping models that punch absurdly above their parameter count, especially on reasoning. Ollama and LM Studio have made the runtime side boring in the best way. You download a model, it works, you move on.

For what I think of as grep-replacement-grade tasks, local models are genuinely great now. Summarising a long document, classifying a pile of inputs into a handful of buckets, drafting the first version of something I will rewrite anyway, extracting structured fields from messy text. The small and medium open models handle that all day long, on my laptop, with no API bill and no round trip to a data centre.

Where they still trail the frontier is the stuff you would expect. Complex multi-step reasoning. Tool use that has to stay coherent over a dozen calls. Novel architectural thinking where the model has to hold a lot of constraints in its head at once. Cloud models like Claude 4.7 and GPT-5 are still meaningfully better at those jobs, and pretending otherwise is how you end up with a refactor that compiles but does not actually work.

Four real use cases

Rather than wave my hands, here are four jobs I do most weeks, and where I have settled on running each one.

1. Drafting commit messages from git log. Local wins.

I pipe a diff and the recent log into a small local model and ask for a conventional commit message. It is fast, it is free, the diff never leaves my machine, and the quality is fine. Commit messages are a constrained format with a clear template. There is no reasoning to do, just summarisation and pattern matching. This is exactly the shape of task where local models shine, and there is a real privacy upside to not sending every diff I write to a third party.

2. Refactoring a five-file feature. Cloud wins.

This one needs to hold the shape of the whole change in its head, follow imports across files, notice when a rename in one place breaks a call site in another, and propose a coherent end state. Frontier cloud models do this well. Open models get there for small refactors, but the moment the blast radius grows, they start losing track. I do not enjoy spending an hour cleaning up a half-finished refactor to save thirty cents in API costs, so this one goes to the cloud every time.

3. Classifying support tickets. Local wins.

I help run a product that gets a steady stream of tickets. Bucketing them into categories like billing, bug, feature request, integration help, is a textbook classification job. A small local model running inside the VPC does this cheaply, quickly, and crucially without sending customer data to an external API. The accuracy is competitive with what I used to get from a frontier cloud call, and the cost is rounding error. This is one of those jobs where the right answer in 2024 was probably the cloud and the right answer in 2026 is definitely local.

4. Pair-programming on novel architecture. Cloud wins.

When I am designing something I have never built before, I want the smartest model I can get. I am paying for breadth of training data, depth of reasoning, and the ability to push back when my first idea is wrong. Frontier cloud models are still the right call here. I will happily spend a few dollars on a session that saves me a week of going down the wrong path.

The hardware story

One thing that has quietly changed is how little hardware you need. A 32GB Mac, the kind plenty of working developers already own, runs the small and medium open models comfortably. I am not talking about a tortured experience where you wait forty seconds per token. I mean usable, interactive, conversational speeds, on a laptop, on battery.

You do not need a GPU rig anymore. You do not need to build a server with three used 3090s in it. Apple Silicon plus a sane runtime like Ollama or LM Studio gets you most of the way there for personal use, and a single workstation with a decent GPU handles team-scale workloads for the kind of tasks I described above.

This matters because it changes the calculus. When local meant building and maintaining hardware, the bar to choose it was high. When local means a tool you already own does the job, the bar drops a lot.

The underrated reason: privacy

The part of this conversation I think gets the least attention is privacy. Not in the abstract civil-liberties sense, although that matters too, but in the very concrete sense of which data your company is comfortable sending to which vendor.

Customer support tickets, internal documents, draft contracts, source code with proprietary logic, medical notes, anything covered by an NDA, all of these have an awkward relationship with cloud LLMs. The vendors are mostly trustworthy and have reasonable terms, but legal teams still have to do work to make it okay, and some categories of data are just easier to keep on-prem entirely.

Local models close that gap. The data never leaves the machine. There is nothing to audit, no terms to negotiate, no breach surface to worry about beyond the one you already had. For a non-trivial slice of real-world LLM use, that is the entire reason to choose local, and the quality being good enough is the bonus.

Not local vs cloud

If you came here looking for a side to pick, I do not have one for you. The framing of local versus cloud was always a bit of a false binary, and in June 2026 it is more obviously the wrong question. The right question is which model is the right tool for this specific task, and the honest answer is sometimes the one on your laptop and sometimes the one in someone else's data centre.

I keep one frontier model in the cloud and one open model running on my Mac. They do different jobs. They are both earning their keep. That is the setup I would recommend to anyone serious about getting work done with LLMs right now.

Want more like this?

Occasional, opinionated, no listicles.
all writing →