Testing Strategy

Design Rationale

The test suite is organized around subsystem boundaries rather than code layers. Each test module targets a specific contract surface — a schema, an API route group, a pipeline stage, or a persistence layer — because the most common failure mode in Lorewright is a contract mismatch between adjacent subsystems (e.g., a schema field rename that the API handler doesn't reflect). Testing at the contract boundary catches these mismatches earlier than unit-testing individual functions would.

There is no single integration test that exercises the full SRD-to-UI data flow. This is intentional: the pipeline is designed around immutable, generated artifacts (progressions/), which means each stage can be validated independently against known inputs and expected outputs. A full end-to-end test would require SQLite setup, file I/O, and HTTP assertions in a single test case, making it fragile and slow. Instead, the strategy relies on overlapping coverage — pipeline tests verify that generation produces correct JSON, and API tests verify that the API reads and serves that JSON correctly.

Frontend tests in web/src/__tests__/ focus on the choice/decision UI logic (selection caps, dependency resolution, prerequisite validation) because that is where the most complex client-side state lives. The frontend does not duplicate backend validation; it trusts the API contract. This means frontend tests are primarily about user interaction correctness, not data correctness.

Assumptions & Constraints

Tests assume local filesystem access. Pipeline and progression tests read from srd/, write to temporary SQLite databases, and emit progression JSON. They will not run in environments without filesystem access or where the srd/ directory is missing.
No test database fixtures are committed. Each test module creates its own SQLite state from source JSON. This keeps tests self-contained but means they are slower than fixture-based approaches. The trade-off is that tests always validate against current source data rather than stale snapshots.
Pydantic extra="forbid" is the primary schema enforcement. Many "schema tests" are really exercising Pydantic's strict validation. If a field is added to a response but not to the model, tests fail with a validation error rather than a missing-assertion failure. This is a feature: the schema definition is the test expectation.
The site/ static build is a separate validation surface from web/. site/ validates the public wiki rendering pipeline (markdown → HTML via unified), while web/ validates the application UI. They share no code and have independent dependency trees.
GRPO and TL;DR tests validate dataset structure, not LLM output quality. These pipelines produce training data for external models; the tests verify record shapes, required fields, and partition logic — not whether the generated text is good.

Conceptual Model

The test matrix maps to a layered dependency graph:

Source Data (srd/, homebrew/)
    ↓
Pipeline Tests (refresh, generation, normalizers)
    ↓
Schema Tests (Pydantic model validation)
    ↓
Engine Tests (level-up preview/apply logic)
    ↓
API Tests (HTTP contract, route behavior)
    ↓
Frontend Tests (UI interaction, choice logic)

Each layer trusts the layers below it. Pipeline tests verify that source data produces valid normalized records. Schema tests verify that the Pydantic models accept the shapes those pipelines emit. Engine tests verify that the level-up state machine produces correct previews and applies. API tests verify that FastAPI routes serialize and deserialize correctly. Frontend tests verify that the React UI handles the API responses properly.

When a change spans multiple layers (e.g., adding a new choice type to the level-up flow), you should run tests across all affected layers — typically the schema, engine, and API test modules together. The "Expanded Checks" sections below provide the right groupings for common change types.

Test Areas

Wiki API and rendering behavior: tests/test_wiki_api.py
Pipeline refresh and normalization behavior: tests/test_pipeline_refresh.py, tests/test_normalizers.py
Progression generation correctness: tests/test_progression_generation.py
Tool API behavior: tests/test_progression_tool_api.py
SRD level-up schema + engine + API behavior: tests/test_srd_level_up_schemas.py, tests/test_srd_level_up_engine.py, tests/test_srd_level_up_api.py
Character-store persistence invariants: tests/test_character_store.py
TLDR dataset builders and record loading: tests/test_spell_tldr_dataset.py, tests/test_feat_tldr_dataset.py, tests/test_feature_tldr_dataset.py, tests/test_character_level_tldr_dataset.py
GRPO combat enrichment and finalization pipeline: tests/test_grpo_combat_enrichment.py
Frontend unit coverage for the web/ app: web/src/__tests__/

Minimum Checks For Wiki-Surface Changes

npm --prefix site run build
pytest tests/test_wiki_api.py -q

The site build is the primary validation here because it exercises the full markdown rendering pipeline (site/lib/wiki.ts) — wiki-link resolution, @file: reference expansion, and HTML sanitization. If any wiki page has broken frontmatter, an unresolvable link, or malformed markdown, the build will fail. The pytest module validates the FastAPI wiki routes, which serve the same content through a different code path.

Expanded Checks For Pipeline Or API Changes

pytest tests/test_pipeline_refresh.py tests/test_progression_generation.py tests/test_progression_tool_api.py

Run this group when modifying anything in engine/ingest/ or the progression-related routes in server/__init__.py. The three modules cover the full pipeline: source ingestion, artifact generation, and API serving.

Expanded Checks For SRD Level-Up Contract Changes

pytest tests/test_srd_level_up_schemas.py tests/test_srd_level_up_engine.py tests/test_srd_level_up_api.py tests/test_character_store.py

The level-up subsystem has the tightest coupling between schemas, engine logic, and persistence. Changes to the level-up preview/apply flow, choice types, or character state shape should always run all four modules. The character store tests verify that snapshots are written and read correctly, which is the final step of the apply flow.

The GRPO combat test module now covers rehydration, exact FIREBALL ally HP transfer, normalized-name alias handling, and the finalized dataset composition path. That makes it the best regression check when the combat enrichment or transfer helpers change.

Frontend Validation

npm --prefix web run test
npm --prefix web run build

Use the static-site build separately from the application build: site/ validates public wiki rendering, while web/ validates the application UI. The TypeScript build (npm --prefix web run build) also serves as a type-checking pass — since web/src/types.ts mirrors the Python schemas, a build failure here can indicate a backend contract change that hasn't been reflected in the frontend types.

Maintenance Rule

If route shape, schema fields, or wiki metadata contract changes, update tests in the same task.