Lattice Earned Its Keep.
A field note from a specific session where combinatorial testing via the Lattice CLI surfaced production-class bugs that no code review would have caught. Written immediately after the session so the specifics are still warm.
This note exists because the pattern works, the value is real, and the methodology is repeatable. If you're starting a new engagement with state-bearing models and you're not sure whether Lattice is worth the upfront ceremony — read this and decide.
The engagement: a small-business accounts-payable / cost-of-goods-sold build. The Rails app ingests vendor invoices, runs reconciliation rules, and surfaces findings to a daily brief generated for the operator. The session described here was the schema spine for the AP/COGS layer. By the end of it, the testing layer had grown from 117 tests to 559 tests across 15 Lattice rounds.
Across both batches of rounds: 4 real production-class bugs caught and fixed, 8 missing has_many declarations added, ~163 net new tests, ~1,500 net new assertions.
These are the headline finds. Each one is a bug that no human code reviewer would have spotted by reading the code in isolation.
Bug 1 — SQLite silently rounds decimal(12,2) on write
The reconciliation schema parameterized decimal_edge ∈ {whole_dollar, has_cents, sub_cent_pre_round} as a dimension. The "sub_cent_pre_round" rows wrote values like 12.374 to a decimal(12,2) column and asserted round-trip equality. The test failed: SQLite silently rounded 12.374 to 12.37 (half-up).
Production-relevant implication: any reconciliation engine that wants to detect sub-cent vendor PDF drift has to compare against the original captured total before the canonical round-trip. Comparing against the rounded total (which has lost precision) silently papers over the divergence.
No human read of the code would have surfaced this. The behavior is in the SQLite storage layer, not in any Ruby file.
Bug 2 — Catch-weight cumulative drift on multi-line invoices
10 lines of unit_net_price = 3.4250 × qty = 1.0001 should sum to $34.255687... → rounded to $34.26 if computed unrounded then rounded. But if each line_total is computed and stored as decimal(12,2) first (rounding each line individually), and THEN summed, you get $34.30 (each line rounded to $3.43, ×10 = $34.30).
$0.05 divergence per 10-line invoice. On weekly catch-weight produce invoices, that compounds. Over a year, the divergence is real money.
Production-relevant: the reconciliation engine must commit upfront to ONE strategy:
- Sum stored line_totals (matches books, accumulates rounding drift), OR
- Recompute from
unit_net_price × qty(matches math, may not match books)
These are NOT equivalent. The lattice tests pin the divergence so a future maintainer who switches strategies surfaces the change visibly rather than silently shipping different numbers.
Also pinned by this round: the Rails sqlite3 adapter uses BigDecimal#round(scale, ROUND_HALF_UP), NOT banker's rounding. Verified at specific halfway values: 17.105 → 17.11 (banker's would give 17.10), 2.045 → 2.05 (banker's would give 2.04). A future Postgres migration would silently change every books-grade total at a 3rd-decimal halfway boundary. The tests fail the day someone makes that swap.
Bug 3 — SQLite UNIQUE treats NULL as distinct (× 3 tables)
The pattern: a partial unique index declared as:
add_index :vendor_accounts, [ :vendor_id, :effective_to ],
unique: true,
where: "effective_to IS NULL"
The intent: at most one "current" row per vendor (where effective_to IS NULL).
The reality: SQLite (per the SQL standard) treats two NULLs as DISTINCT in a UNIQUE index. So two rows with the same vendor_id and both effective_to = NULL are happily admitted by the unique constraint. The constraint was load-bearing for SCD-2 correctness ("only one current row") and silently didn't enforce.
Fix: drop effective_to from the column list. The WHERE effective_to IS NULL clause already restricts the index scope to current rows; the columns just need to identify the "logical key" within that scope.
Same bug appeared on three different tables — all variations of the same SCD-2 pattern. One was worse because an additional column was also nullable (the "fallback" case). Fixed by splitting into two partial indexes — one for specific-kind rules, one for the nil-fallback.
Lesson learned during the fix: never edit a migration in place after it's been run. One agent in the session modified a migration in-place; the DB never re-ran the migration, schema.rb stayed broken. Had to write a follow-up migration to actually apply the fix to the live DB.
Bug 4 — VendorRule.live admitted future-dated rules
The original scope:
scope :live, -> { where(active: true).where(effective_to: nil) }
The bug: no lower bound on effective_from. A rule with effective_from = 2099-01-01 would be picked up as "live" today.
The lattice schema parameterized lifecycle ∈ {brand_new, superseded, expired, future_dated} — and the future_dated rows surfaced the gap immediately.
Fixed by adding .where("effective_from <= ?", Date.current) to the scope. The system reads from vendor_rules.live when applying overrides during invoice extraction; a future-dated rule firing prematurely would silently produce wrong invoice data.
has_many declarationsThis isn't a single bug — it's a recurring pattern caught across multiple rounds. The class: a belongs_to on a child model with no corresponding has_many on the parent.
Why it matters: without the inverse has_many, the parent doesn't know about the relationship. When the parent is destroyed, the database raises a raw ActiveRecord::InvalidForeignKey at the SQLite PRAGMA layer instead of the conventional Rails-layer restrict_with_error validation. Users see a 500-class error instead of a clean form-validation message.
8 distinct missing declarations across 4 lattice rounds. Reading any single model file would not have shown the gap. The schema looks complete when you read it. The gap only appears when you destroy the parent.
Lattice catches this because the persistence-cascade schema parameterizes "parent has children" × "delete action" as dimensions — every combination gets a test, and the missing has_manys surface as raw FK violations where clean validation errors were expected.
Beyond the headline bugs, the rounds produced:
- 155 new tests, 1,207 new assertions across the spine — pinning behavior that would have been "trust me bro" without them.
- Type contracts pinned that would silently change under common migrations: SQLite default
TEXTcollation is binary case-sensitive ("ABC" != "abc"); Rails 8.1 silently coerces""→nilon enum string columns; validator + DB unique-constraint agreement verified across 5 columns. - Aging-arithmetic invariant explicitly REJECTED —
current_amount + days_30 + days_60 + days_90_plus == current_balanceis NOT enforced by the schema, and vendor statements legitimately violate it (separate credits lines, etc). A future CHECK constraint would now surface here, not in production. - Payment-term guard-clause priority order pinned —
requires_prepay>eom_offset_days>net_days> fallback. Invisible from the happy-path archetype tests because each archetype only sets one field.
This is the part worth re-applying to future engagements.
1. Forced enumeration is the actual leverage. Lattice's pairwise sampling math is table stakes. The real value is that writing the YAML schema forces you to name every dimension and every value explicitly. When you write decimal_edge: [whole_dollar, has_cents, sub_cent_pre_round], the act of naming "sub_cent_pre_round" as a thing that exists is what surfaces the bug. The pairwise generator turning it into N test rows is just the bench. Said differently: if you wrote the schema but never ran the generator, you'd still get 80% of the value.
2. Parallel agents with worktree isolation. Each round was run by a separate agent operating in an isolated git worktree. Wall-clock for a 5-agent parallel batch: ~10–15 minutes. Same work sequentially: 30–45 minutes. The trade-off: parallel agents can't see each other's work. This means cross-round conflicts surface — one round removed presence validators that another round wrote tests assuming. That conflict is itself useful signal. It represents a real disagreement between two competent reviewers that would otherwise be buried in groupthink. Treat the conflict as evidence that the underlying behavior is genuinely ambiguous, then resolve deliberately.
3. A sharp, opinionated reviewer voice produces decisive code. The reviewing persona used was sharp, opinionated, preferred framework idioms (enum, restrict_with_error, serialize :col, coder: JSON), cut premature abstractions. Used both for review ("find what's wrong") and fix ("make this specific change, surgically"). The voice keeps the work tight.
4. Schema first, tests second, code last. Write the lattice YAML — name the dimensions and values. Run lattice generate --format table — get the covering array. Write one test per row, with expected behavior pinned as a literal (not recomputed in the test). Run the tests — failures are signal either about the code or the schema. If a real bug surfaces, fix the code AND leave a comment in the schema noting what was caught. Those breadcrumbs are the value preserved across years.
5. CI gate. A test asserts every YAML schema has at least one test file that references it. A schema that drifts away from its tests fails CI. The gate is loose (any reference counts) but it prevents the schema-without-tests state from becoming permanent.
Cost: ~6–7 hours of session time for 15 rounds, including the human-side review of agent output and the cherry-pick consolidation. The LLM compute cost was on the order of $5–10 in API charges.
Value (concretely):
- 4 production-class bugs caught BEFORE shipping
- 8 missing
has_manydeclarations caught BEFORE shipping - ~155 new tests pinning behavior that would otherwise be discoverable only via production failure
- A repeatable methodology now documented for future engagements
The cost-benefit calculation isn't "did Lattice pay for itself this session" — it's "would I rather spend 7 hours up front or 70 hours debugging production failures over the next year." Each one of the 4 bugs above could plausibly have eaten 5–10 hours of investigation in production. The math is wide-margin positive.
Counter-honest: the marginal value of round 6 vs the first 5 was small. Diminishing returns set in around round 7–8 once the obvious dimensions had been parameterized. The high-leverage rounds are the first 3–4 per surface, and the ones triggered by "we added a new state-bearing thing."
Across 15 rounds, the same bug class kept showing up: interactions between dimensions that look fine in isolation.
- SQLite UNIQUE × NULL semantics × partial-index WHERE clauses → Bug 3
decimal(12,2)storage × cumulative line-sum arithmetic → Bug 2- Active boolean ×
effective_toNULL ×effective_fromtemporal bound → Bug 4 belongs_todeclaration × parent destroy × FK constraint → thehas_manypattern
Reading the code in isolation: each piece looks fine. Reading the migrations: each migration looks fine. Reading the model: the model looks fine.
The bugs live in the INTERACTIONS. That's exactly what Lattice is built to surface — combinations, not single dimensions. Code review is good at single-dimension correctness. Lattice is good at multi-dimensional correctness.
When you find yourself asking "would code review have caught this?" — the answer for Bug 1–4 is "no, not by reading individual files." That's the test. If you're working on something where the bugs live in interactions, Lattice is the right tool. If you're working on a one-input pure function, it isn't.
State-bearing models almost always have the interaction shape. That's why this gets pulled out for those specifically. It's not about ceremony or thoroughness — it's about the bug class.