9:14am, opened Claude Code, ran the first task. By 5pm, three things I would have hand-rolled were done. Not vibes. Not benchmarks. Just one day of actual work, written down as it happened, so I can tell you what the upgrade to 4.7 felt like from the inside of a normal Tuesday.
I am writing this on the same machine I worked on, with the terminal still warm. The point is not to sell you on the model. The point is to give you a believable diary so you can decide whether the jump is worth your morning.
9:14am: refactor an extension
First task of the day was a refactor I had been putting off. One of my Squilla extensions had grown a tangled service file with three responsibilities packed into it, and I wanted to split it into a thin handler, a domain service, and a repository. Nothing exotic. The kind of cleanup any working developer recognises.
On 4.6 this would have been a forty-five minute dance. Ask for a plan, push back on the plan, ask for the split, fix the imports it forgot, fix the tests it broke, then re-run lints. I have done this exact pattern enough times to time it with a stopwatch.
On 4.7 it took roughly thirty minutes from first prompt to green tests. The interesting part was not the speed, it was where the speed came from. The plan it proposed first was already the plan I would have asked for after two rounds of feedback. The imports were right on the first try. It noticed that one test file imported a private helper that was now in the new repository package and rewrote the import without being asked. That is the kind of small thing that adds up.
Fifteen minutes saved on a one-hour task. Not life-changing. But the day was twenty-six minutes old.
11:02am: a flaky Tengo script
Late morning brought a real puzzle. I had a Tengo script in an extension that was failing intermittently in production. Not every run. Maybe one in twelve. The kind of bug that wastes your whole afternoon if you let it.
The clue was thin. A log line saying a value was nil when it should not have been, with a stack that pointed at a helper called from six different sites across the extension. I dropped the relevant files into context and asked Claude 4.7 to trace where the value could become nil.
This is where the new long-context handling actually mattered. It walked through all six call sites, noticed that one of them was the entry point for an event handler that received batched payloads, and pointed out that the helper assumed a single payload shape. In the batched path, the first item was being passed through correctly and the rest were silently dropping their context.
I could have found that myself. I have found bugs like that myself, plenty of times. The difference is that on 4.6 the model would have given me a confident wrong answer twice before getting there, and I would have wasted twenty minutes chasing the wrong thread. This time the first trace was already correct. I wrote the fix, added a test for the batched case, and went to lunch a full hour earlier than I expected.
1:30pm: a cold PR review
After lunch a friend asked me to look at a PR for a Go service I had never seen before. New codebase, new team conventions, no shared context. I usually decline these because they are expensive in time and cheap in signal.
I pasted the diff into Claude and asked for a review as if it were a senior reviewer on that team. I did not tell it what to look for. I wanted to see what it noticed unprompted.
The review was sharper than 4.6 would have produced. It caught a subtle issue where a goroutine was capturing a loop variable by reference (the classic Go gotcha that was supposed to die with the loopvar change, but which still bites you on older module versions). It flagged a context that was being created with a five second timeout in a function that called a downstream service with a ten second SLA, which would have caused mysterious cancellations under load. And it noticed that one of the new tests asserted on a map iteration order, which would flake the moment the test ran on a different Go version.
Three real findings, none of them invented, all of them the kind of thing a careful human reviewer would have caught on a good day. The PR author thanked me. I did not tell them I had a co-reviewer.
3:45pm: publish a draft straight from chat
This is the part where the day stopped feeling like a faster version of yesterday and started feeling like something new.
I had a half-finished blog post sitting in my notes. I wanted it published. Normally I would open the Squilla admin, paste it in, set the category, pick a date, check the slug, hit publish. Maybe five minutes of mechanical clicking.
Instead I asked Claude to publish it through the Squilla MCP tools I had wired up that week. It read the draft, picked a sensible slug, asked me which category I wanted (I had two close ones), confirmed the publish date with me, and called the create-node tool with the right shape on the first try. The post went live while I was still in the chat window.
The detail I want to call out is that this was not worth doing on 4.6. I had tried it. The model would get the tool shape wrong, or hallucinate a field that did not exist, or insist on a category that was not in the list. The friction was higher than just opening the admin. On 4.7 the friction inverted. Chatting it through is now faster than clicking, and that crosses a threshold for me.
5:02pm: release notes from git log
Last task of the day, and another one that was not worth doing on the old model. I had a release going out and I needed notes. I gave Claude the output of git log between the last tag and HEAD, told it the audience was technical users of the CMS, and asked for a clean markdown summary grouped by theme.
What I got back was already 80 percent of a publishable changelog. It correctly grouped commits by area, surfaced one breaking change I had buried in a refactor commit, and ignored the eight chore commits about CI config. I edited two lines and shipped it.
On 4.6 this would have been a fight. Either it would invent features that were not in the log, or it would be so cautious that the output was just the commit messages reformatted. The middle ground, where the model actually understands what a release note is supposed to read like, is new.
The honest pattern
So what does the day add up to? I want to be careful here because the AI discourse is full of people claiming 10x. That is not what this is.
The math, as best I can tell, is 15 to 20 percent on tasks I was already doing. The refactor was a bit faster. The bug hunt was a bit faster. The PR review was a bit sharper. Each one is a small win. Add them up across a week and it is meaningful, but it is not the stuff of magazine covers.
What is more interesting is the second category. Two of my five vignettes today (publish from chat, release notes from log) were tasks that were not worth doing on the old model. The friction was too high, the failure rate too annoying, the cleanup too much work. On 4.7 they crossed the threshold from "not worth it" to "obviously worth it". That is where the real change is hiding. Not in making old work faster, but in turning previously uneconomical work into routine.
The annoyances, because there are still some
I want to be fair. A few rough edges showed up over the day.
It still occasionally suggests deprecated APIs. Today it offered me a Fiber middleware signature that has been gone for two minor versions. I caught it because I know that codebase. A newer developer would have copied it in and spent an hour debugging.
It still over-explains simple code. When I asked for a one-line slug helper, I got a one-line helper with a four-paragraph explanation of how Unicode normalisation works. I did not ask. I would have been happy with the line and a half-sentence comment.
And it still occasionally writes tests that are essentially restating the implementation in a different syntax. Not always, much less often than before, but enough that I still read every test it produces with the same suspicion I read every test a junior developer produces.
The verdict, such as it is
The upgrade is not dramatic. If you are looking for a moment where the ground shifts, this is not it. The shift is quieter than that.
What I noticed across one ordinary day is that the model has stopped fighting me on the easy stuff. The plans are closer to right on the first try. The bug traces land on the right line. The tool calls have the right shape. The summaries are calibrated to the audience. None of those are individually thrilling. Together they are enough that the calculation has changed.
I will not go back. Not because 4.7 is amazing, but because every time I imagine going back I picture losing the 15 percent and gaining the friction that made me skip the chat-based publishing and the log-based release notes. That is a small loss in absolute terms and a big loss in how my day feels.
If you spent the morning thinking about the upgrade, here is my one-sentence take. Try it for a workday. Not a benchmark, not a demo, a workday. Then count what you would have done by hand. If three things on that list got done without you, you already know.