Local models now handle 90% of what I used Claude Pro for, but the other 10% is…

The topic Local models now handle 90% of what I used Claude Pro for, but the other 10% is… is currently the subject of lively discussion — readers and analysts are keeping a close eye on developments.

This is taking place in a dynamic environment: companies’ decisions and competitors’ reactions can quickly change the picture.

Running a local LLM and paying for Claude Pro at the same time isn’t exactly the most economical setup. But there’s a version of using AI tools where you’re not really trying to optimize or pick a winner, you’re just using whatever’s open in front of you for whatever the task is. That’s roughly where my local LLM and Claude sit now. Both are open most days, both get reached for constantly, and the lines between which one handles what have gotten blurry, so comparisons are starting to feel inevitable.

Local models have come a long way, and that’s not a hot take anymore. And the more time I spend in my LM Studio setup, the more I notice how much of what I used to default to Claude for is now perfectly handled locally. Not all of it though… So this isn’t actually about me canceling my Claude subscription. It’s more about figuring out what local now does well enough that paid AI doesn’t need to be in the conversation, and what’s still genuinely worth $20 a month.

The thing that makes local models feel so static out of the box is that every session is a blank slate. The model doesn’t know who you are, what you do, or that you’ve already told it twice to skip something it seems to keep assuming you want to hear. Cloud AI tools work around this with features like Claude Projects and Gemini Gems, which load your background and preferences into every chat automatically. Of course, many also have global memory. Local runners don’t have a direct equivalent.

But you can fake it pretty convincingly with what I’ve been calling context journals – a system prompt with your behavioral rules, plus an uploaded document with deeper background. I wrote a piece on the setup if you want the walkthrough, but the short version is that it’s not real memory and not as scalable. What it does do, however, is make local behave like a tool that’s met you before, which is most of what people are really paying cloud for.

The actual tasks aren’t really the conversation here. Regardless of what I’m doing – summarising, document chat, research – my local Qwen 3.5 9B handles all of this at a quality level that doesn’t have me reaching for Claude out of necessity, but more so just familiarity and comfort. If anything, I have more fine-grain control over its responses with the parameters, which you don’t get with cloud AI. So the bigger picture isn’t whether local can do the work, it’s everything that surrounds the work being easier.

The most obvious one is that the usage cap is my GPU, not a five-hour timer. Even on Pro I’ve hit Claude’s reset on heavy days, and there’s a kind of background calculation that goes with it – is this prompt worth burning Opus, or should I drop to Sonnet, or do I save the good stuff for later, etc. Local just lets you do your thing without countdowns.

Privacy is the other one that I won’t belabour here because it’s been covered to death already, but the everyday version of it does matter. Not every prompt contains sensitive information, but the point is that I no longer have to think about whether it does.

For genuinely hard reasoning work, Opus 4.7 is still in a tier of its own. Released April 16, 2026, it’s currently Anthropic’s most capable model and the one I reach for when I need the model to actually push back on something, not just agree with me. Local models tend to be more compliant by default – fine for most tasks, bad for sense-checking or half-formed arguments.

To me, design is the other big one, and this is where not just cloud, but Claude specifically, won me over. Claude has inline visuals, Artifacts, and now Claude Design (a workspace built specifically for prototypes, slides, and UI mockups, powered by Opus 4.7). Then there are the creative tool connectors which let Claude work inside Figma, Adobe Creative Cloud, Affinity, and Canva. Local models can analyse a screenshot of a design and give you feedback, sure, but that’s about it. Claude can sit inside the tools and do work in them, with full context and reasoning.

The last one is multi-constraint instruction following. Claude can hold a stack of format rules, tone rules, voice rules, and a specific angle all at once without dropping any of it. Local handles a few constraints fine, but the more you pile on, the more it starts losing track of the earlier ones. Definitely something you could live with if local is important to you, but it’s a nice-to-have in any cloud model.

Local AI has gotten genuinely capable, and a lot of what used to make cloud feel essential isn’t really essential anymore. But there’s still a layer of work where Claude does things nothing on my machine can touch, and that’s the part that justifies keeping Pro around. Both stay in the rotation, just with the lines drawn in the sand a little clearer than they used to be.