Claude Code · Opus 4.6 vs Opus 4.7

Same requirements spec, different model generation (4.6 → 4.7).
Comparison: does the newer model produce better-factored code?

How each dimension was judged during human review — operational definitions & scoring method

Scoring method. Each dimension is a head-to-head. A model "wins" a row when its output meets the operational bar more cleanly than the other. A "tie" means the observable difference is within measurement noise. No weighted average is computed — the takeaway is the shape of wins, not a composite score.

Dimension	What it measures	Signals for "better"
File size	Total lines of the single-file output	Fewer lines for the same feature set
Strict mode	Whether the JS block opens with `'use strict'`	Present — flags implicit globals & reserved-word assignment
CSS custom properties	Presence of `--var` declarations at `:root`	More named variables; fewer repeated hex values
Global state variables	Module-scope `let` declarations that get mutated	Fewer globals; clearer ownership of mutable state
`var`/`let`/`const` discipline	Use of `const` where rebinding never occurs; absence of `var`	Higher const-to-let ratio; `var`-free
Code layering	Whether top-to-bottom order reflects dependency order	Constants → state → DOM refs → helpers → logic
Naming quality	Identifier clarity at the top of the file and in hot paths	Full words; convention held across long generations
Function length (worst case)	Line counts of the longest functions	Lower P90; fewer functions over 50 lines
Async control flow	Max closure/callback nesting in async pipelines	Shallower nesting; named helpers over inline lambdas
Magic numbers	Unnamed numeric literals in core logic (scoring, AI, timing)	Hoisted to named constants or clustered near call sites
Persistence layer	Organization of localStorage access	Dedicated load/save module; try/catch without silent fails
Input-handler guard pattern	Repeated precondition checks across event handlers	Centralized guard wrapper; no inline duplication
HTML structure	Semantic clarity of the DOM layout	Named layout sections; logical grouping over flat div-soup

Worked example · Naming quality

LESS CLEAR · FOUND IN 4.6

// deep in the AI scoring code
let topPen = 0;
...
tooltip.innerText =
  'Hght: ' + h + ', Vlly: ' + v;

Abbreviations appear in hot paths late in the file. Signals the model started losing its naming conventions under long-generation context pressure.

CLEARER · FOUND IN 4.7

function evaluateBoardDetailed(board) {
  const heights = columnHeights(board);
  const holes   = countHoles(board);
  const valleys = detectValleys(heights);
  ...
}

Full-word identifiers continue deep into the file. The model is still honoring the naming convention it set up earlier.

Worked example · Async control flow

5 LEVELS DEEP · 4.6

setTimeout(() => {
  const moveToTarget = () => {
    if (x === target.x) {
      setTimeout(() => {
        hardDrop();
      }, delay);
    }
  };
  moveToTarget();
}, delay);

Inline closures nested inside a recursive setTimeout chain. Every level is another thing the reader's eye has to pattern-match.

2 LEVELS DEEP · 4.7

const push = () => {
  if (!autoplayEnabled) return;
  hardDrop();
  spawnNext();
};

setTimeout(push, delay);

Inner logic hoisted into a named helper. Scheduling and body are now separable units.

Dimension	Claude Code (Opus 4.6)	Claude Code (Opus 4.7)	Winner
File size	2,003 lines	1,800 lines	Opus 4.7
Strict mode	Yes	No	Opus 4.6
CSS custom properties	No — hard-coded panel widths, colors	Yes — `:root { --bg, --panel, --border, --text }` (11–19)	Opus 4.7
Global state variables	~25	~20	Opus 4.7
`var`/`let`/`const` discipline	Reassigns `const`-style initial vars (`board` at 687 re-bound at 1823)	Clean: `const` for refs/state, `let` for locals, no `var`	Opus 4.7
Code layering	Comment-labelled sections inside one large script block	Distinct blocks: constants → state → DOM refs → helpers → logic (616–795)	Opus 4.7
Naming quality	Cryptic abbreviations in hot paths (e.g. `topPen`, `Hght:`, `Vlly:`)	Descriptive identifiers throughout (e.g. `randomPieceType`, `fmtNum`, `evaluateBoardDetailed`)	Opus 4.7
Function length (worst case)	Longest function ~92 lines (1170–1262)	Longest function ~90 lines (769–858)	Tie
Async control flow	Deeply nested closures and `setTimeout` chains (1499–1549)	Nested closures present but with narrower scope (1403–1446)	Opus 4.7
Magic numbers	Scattered inline and uncommented in core logic	Present, but clustered and closer to call sites	Opus 4.7
Persistence layer	Clean dedicated module (884–950)	Inline with try/catch — silent fails on error	Opus 4.6
Input-handler guard pattern	Centralized guard wrapper reused by all handlers (1980–1990)	Guard clauses duplicated inline per handler	Opus 4.6
HTML structure	Flat `div`-heavy	Three clearly labelled layout sections; more navigable	Opus 4.7

Takeaway: Clear generational improvement. Claude Code (Opus 4.7) wins 9 of 13 rows: smaller, more disciplined, CSS vars, better naming, clearer layering. Claude Code (Opus 4.6) retains only three wins: strict mode, a dedicated persistence module, and a centralized input-handler guard pattern. The newer model trades some safety rails (strict mode) for substantially better factoring.

Quantitative static analysis

Human review judgment is subjective. To pressure-test the qualitative claims above, I extracted the JS from each file, walked the AST with espree, and ran eslint 9 with a shared flat config (complexity ≤ 10, max-depth ≤ 4, max-lines-per-function ≤ 50, no-magic-numbers with [−1, 0, 1, 2] allowed, prefer-const, eqeqeq). CSS metrics were measured by regex on the <style> block. The numbers below either anchor the qualitative findings — or complicate them.

−11.6%

JS lines

1,343 → 1,187

−8.5%

Total ESLint warnings

71 → 65

−17%

Magic-number literals

47 → 39

−50%

Functions > 50 lines

6 → 3

−51%

Hex colour occurrences

51 → 25 · CSS vars absorbed the rest

+76%

Top-level const decls

17 → 30

Metric	Opus 4.6	Opus 4.7	Δ	Source
JS lines	1,343	1,187	−11.6%	wc
CSS lines	512	470	−8.2%	regex
CSS custom properties declared	0	7	+7	regex
Hex colour occurrences in CSS	51	25	−51%	regex
`'use strict'`	Yes	No	regression	regex
Total ESLint warnings	71	65	−8.5%	eslint
`no-magic-numbers`	47	39	−17%	eslint
`max-lines-per-function`	4	3	−1	eslint
`complexity` (cyclomatic > 10)	7	9	+2	eslint
`max-depth` (> 4)	11	12	+1	eslint
`eqeqeq` (uses of `==`/`!=`)	0	2	regression	eslint
Function count (incl. arrows)	92	97	+5%	AST
Functions > 50 lines	6	3	−50%	AST
Longest function (lines)	93	91	−2	AST
P90 function length	42	41	−1	AST
Max cyclomatic complexity	44	47	+3	AST
Median cyclomatic complexity	3	4	+1	AST
High-complexity functions (>10)	8	12	+4	AST
Max block-nesting depth (global)	11	15	+4	AST
Top-level `let` declarations	30	29	−1	AST
Top-level `const` declarations	17	30	+76%	AST
`var` declarations anywhere	0	0	—	AST
`setTimeout` calls	6	5	−1	AST

Where the automated metrics confirm the human review

File size, CSS, naming volume, const discipline, magic numbers all move in the direction observed by the human review. 4.7 is 11.6% tighter, declares seven CSS custom properties where 4.6 declares none, uses 76% more top-level const, and carries 17% fewer no-magic-numbers warnings.
The "fewer long functions" claim is sharply confirmed. Functions over 50 lines drop from 6 to 3 — a 50% reduction — even though the total function count is slightly higher.
The qualitative "9 wins vs 3 wins" scoreboard broadly agrees with the metric deltas when you count the tables: 17 of 22 quantitative rows favour 4.7.

Where the automated metrics disagree or nuance the claim

"Flatter async" is a local win, not a global one. The human review saw 4.7's executeAutoplayMove collapse from 5 nested levels to 2 — that's real. But max block-nesting depth across the whole file went up, not down (11 → 15). 4.7 has deeper nesting elsewhere. Pattern wins don't automatically generalize.
Cyclomatic complexity got worse on average. Median complexity: 3 → 4. Max: 44 → 47. High-complexity function count: 8 → 12. 4.7 writes smaller files, but each function branches more. Teams that enforce McCabe-style caps should notice this.
eqeqeq regressed — 4.7 introduces two uses of ==/!= where 4.6 had zero. Minor, but it's the kind of discipline slip 'use strict' plus a linter would have caught.
Loss of 'use strict' is confirmed by automated regex — the single biggest safety-rail regression from the upgrade.

Methodology. JS extracted from each <script> block and parsed with espree (ECMAScript 2022, script mode). Cyclomatic complexity computed per function by walking the AST and counting if / switch case / for / while / catch / && / || / ternary. ESLint 9 run with a flat config; rules above with thresholds: complexity ≤ 10, max-depth ≤ 4, max-lines-per-function ≤ 50, max-nested-callbacks ≤ 3, no-magic-numbers allowing [−1, 0, 1, 2], prefer-const, eqeqeq, no-var. CSS metrics by regex on the <style> block (custom properties = --name:, hex occurrences = #hex, px values = \d+px). Non-functional only — correctness was verified separately with the functional-test suite.