Same requirements spec, different model generation (4.6 → 4.7).
Comparison: does the newer model produce better-factored code?
Scoring method. Each dimension is a head-to-head. A model "wins" a row when its output meets the operational bar more cleanly than the other. A "tie" means the observable difference is within measurement noise. No weighted average is computed — the takeaway is the shape of wins, not a composite score.
| Dimension | What it measures | Signals for "better" |
|---|---|---|
| File size | Total lines of the single-file output | Fewer lines for the same feature set |
| Strict mode | Whether the JS block opens with 'use strict' | Present — flags implicit globals & reserved-word assignment |
| CSS custom properties | Presence of --var declarations at :root | More named variables; fewer repeated hex values |
| Global state variables | Module-scope let declarations that get mutated | Fewer globals; clearer ownership of mutable state |
var/let/const discipline | Use of const where rebinding never occurs; absence of var | Higher const-to-let ratio; var-free |
| Code layering | Whether top-to-bottom order reflects dependency order | Constants → state → DOM refs → helpers → logic |
| Naming quality | Identifier clarity at the top of the file and in hot paths | Full words; convention held across long generations |
| Function length (worst case) | Line counts of the longest functions | Lower P90; fewer functions over 50 lines |
| Async control flow | Max closure/callback nesting in async pipelines | Shallower nesting; named helpers over inline lambdas |
| Magic numbers | Unnamed numeric literals in core logic (scoring, AI, timing) | Hoisted to named constants or clustered near call sites |
| Persistence layer | Organization of localStorage access | Dedicated load/save module; try/catch without silent fails |
| Input-handler guard pattern | Repeated precondition checks across event handlers | Centralized guard wrapper; no inline duplication |
| HTML structure | Semantic clarity of the DOM layout | Named layout sections; logical grouping over flat div-soup |
// deep in the AI scoring code
let topPen = 0;
...
tooltip.innerText =
'Hght: ' + h + ', Vlly: ' + v;
Abbreviations appear in hot paths late in the file. Signals the model started losing its naming conventions under long-generation context pressure.
function evaluateBoardDetailed(board) {
const heights = columnHeights(board);
const holes = countHoles(board);
const valleys = detectValleys(heights);
...
}
Full-word identifiers continue deep into the file. The model is still honoring the naming convention it set up earlier.
setTimeout(() => {
const moveToTarget = () => {
if (x === target.x) {
setTimeout(() => {
hardDrop();
}, delay);
}
};
moveToTarget();
}, delay);
Inline closures nested inside a recursive setTimeout chain. Every level is another thing the reader's eye has to pattern-match.
const push = () => {
if (!autoplayEnabled) return;
hardDrop();
spawnNext();
};
setTimeout(push, delay);
Inner logic hoisted into a named helper. Scheduling and body are now separable units.
| Dimension | Claude Code (Opus 4.6) | Claude Code (Opus 4.7) | Winner |
|---|---|---|---|
| File size | 2,003 lines | 1,800 lines | Opus 4.7 |
| Strict mode | Yes | No | Opus 4.6 |
| CSS custom properties | No — hard-coded panel widths, colors | Yes — :root { --bg, --panel, --border, --text } (11–19) |
Opus 4.7 |
| Global state variables | ~25 | ~20 | Opus 4.7 |
var/let/const discipline |
Reassigns const-style initial vars (board at 687 re-bound at 1823) |
Clean: const for refs/state, let for locals, no var |
Opus 4.7 |
| Code layering | Comment-labelled sections inside one large script block | Distinct blocks: constants → state → DOM refs → helpers → logic (616–795) | Opus 4.7 |
| Naming quality | Cryptic abbreviations in hot paths (e.g. topPen, Hght:, Vlly:) |
Descriptive identifiers throughout (e.g. randomPieceType, fmtNum, evaluateBoardDetailed) |
Opus 4.7 |
| Function length (worst case) | Longest function ~92 lines (1170–1262) | Longest function ~90 lines (769–858) | Tie |
| Async control flow | Deeply nested closures and setTimeout chains (1499–1549) |
Nested closures present but with narrower scope (1403–1446) | Opus 4.7 |
| Magic numbers | Scattered inline and uncommented in core logic | Present, but clustered and closer to call sites | Opus 4.7 |
| Persistence layer | Clean dedicated module (884–950) | Inline with try/catch — silent fails on error | Opus 4.6 |
| Input-handler guard pattern | Centralized guard wrapper reused by all handlers (1980–1990) | Guard clauses duplicated inline per handler | Opus 4.6 |
| HTML structure | Flat div-heavy |
Three clearly labelled layout sections; more navigable | Opus 4.7 |
Takeaway: Clear generational improvement. Claude Code (Opus 4.7) wins 9 of 13 rows: smaller, more disciplined, CSS vars, better naming, clearer layering. Claude Code (Opus 4.6) retains only three wins: strict mode, a dedicated persistence module, and a centralized input-handler guard pattern. The newer model trades some safety rails (strict mode) for substantially better factoring.
Human review judgment is subjective. To pressure-test the qualitative claims above, I extracted the JS from each file, walked the AST with espree, and ran eslint 9 with a shared flat config (complexity ≤ 10, max-depth ≤ 4, max-lines-per-function ≤ 50, no-magic-numbers with [−1, 0, 1, 2] allowed, prefer-const, eqeqeq). CSS metrics were measured by regex on the <style> block. The numbers below either anchor the qualitative findings — or complicate them.
const decls| Metric | Opus 4.6 | Opus 4.7 | Δ | Source |
|---|---|---|---|---|
| JS lines | 1,343 | 1,187 | −11.6% | wc |
| CSS lines | 512 | 470 | −8.2% | regex |
| CSS custom properties declared | 0 | 7 | +7 | regex |
| Hex colour occurrences in CSS | 51 | 25 | −51% | regex |
'use strict' | Yes | No | regression | regex |
| Total ESLint warnings | 71 | 65 | −8.5% | eslint |
no-magic-numbers | 47 | 39 | −17% | eslint |
max-lines-per-function | 4 | 3 | −1 | eslint |
complexity (cyclomatic > 10) | 7 | 9 | +2 | eslint |
max-depth (> 4) | 11 | 12 | +1 | eslint |
eqeqeq (uses of ==/!=) | 0 | 2 | regression | eslint |
| Function count (incl. arrows) | 92 | 97 | +5% | AST |
| Functions > 50 lines | 6 | 3 | −50% | AST |
| Longest function (lines) | 93 | 91 | −2 | AST |
| P90 function length | 42 | 41 | −1 | AST |
| Max cyclomatic complexity | 44 | 47 | +3 | AST |
| Median cyclomatic complexity | 3 | 4 | +1 | AST |
| High-complexity functions (>10) | 8 | 12 | +4 | AST |
| Max block-nesting depth (global) | 11 | 15 | +4 | AST |
Top-level let declarations | 30 | 29 | −1 | AST |
Top-level const declarations | 17 | 30 | +76% | AST |
var declarations anywhere | 0 | 0 | — | AST |
setTimeout calls | 6 | 5 | −1 | AST |
const, and carries 17% fewer no-magic-numbers warnings.executeAutoplayMove collapse from 5 nested levels to 2 — that's real. But max block-nesting depth across the whole file went up, not down (11 → 15). 4.7 has deeper nesting elsewhere. Pattern wins don't automatically generalize.eqeqeq regressed — 4.7 introduces two uses of ==/!= where 4.6 had zero. Minor, but it's the kind of discipline slip 'use strict' plus a linter would have caught.'use strict' is confirmed by automated regex — the single biggest safety-rail regression from the upgrade.
Methodology. JS extracted from each <script> block and parsed with espree (ECMAScript 2022, script mode). Cyclomatic complexity computed per function by walking the AST and counting if / switch case / for / while / catch / && / || / ternary. ESLint 9 run with a flat config; rules above with thresholds: complexity ≤ 10, max-depth ≤ 4, max-lines-per-function ≤ 50, max-nested-callbacks ≤ 3, no-magic-numbers allowing [−1, 0, 1, 2], prefer-const, eqeqeq, no-var. CSS metrics by regex on the <style> block (custom properties = --name:, hex occurrences = #hex, px values = \d+px). Non-functional only — correctness was verified separately with the functional-test suite.