SQLite Schema
SQLite table and merged-view conventions used by ingest and API read paths.
SQLite Schema
Canonical runtime storage is data/5e-database.sqlite for SRD content and data/lorewright-app.sqlite for application state (characters, decision plans, TL;DR datasets). These two databases serve fundamentally different purposes and have different lifecycle characteristics.
Table Pattern
- SRD content tables follow the naming convention
srd_{edition}_{dataset}(e.g.,srd_2014_classes,srd_2024_spells). Each table has a fixed column schema:row_order,srd_index,name,url,srd_version,data(JSON),imported_at,source_payload(JSON), andnormalized_payload(JSON). Thedatacolumn contains the normalized record used by API queries;source_payloadpreserves the original SRD JSON for traceability. - Homebrew projection tables mirror the SRD table schema but contain records sourced from canonical homebrew JSON files in
homebrew/. - Merged views (e.g.,
srd_2014_classes_merged) union SRD rows with homebrew rows, providing a single query surface for API endpoints that need to include both official and custom content. Endpoints default to merged views so that homebrew entities appear alongside SRD entities without special-case routing. - Dataset metadata is tracked in the
srd_datasetstable, which records the table name, dataset name, SRD version, source file path, row count, and import timestamp for every imported dataset. This enables pipeline diagnostics and staleness detection.
Column Design
Every SRD table stores three JSON representations per record:
| Column | Purpose |
|---|---|
source_payload | Raw JSON from the SRD file, preserved exactly as ingested. Used for debugging and diffing against upstream changes. |
normalized_payload | Output of the family-specific normalizer. Flat, consistent field names. |
data | The query-facing payload, currently identical to normalized_payload. Decoupled to allow future API-specific projections without changing the normalizer. |
The data column has a CHECK (json_valid(data)) constraint, as do source_payload and normalized_payload. This ensures SQLite's json_extract() functions will never encounter malformed JSON at query time.
Indexes are created on srd_index (unique, partial—only where non-null), name, and url for each table. The srd_index is the primary lookup key used by the progression generator and level-up engine.
Query Conventions
- Application reads rely on the
dataJSON payload column. All field access goes throughjson_extract(data, '$.field_name')rather than dedicated columns. This avoids schema migrations when the normalized payload shape evolves—only the normalizer functions need to change. json_extract(...)is used for field-level filtering and sorting in API handlers. For example, the spell catalog route extracts$.levelto include spell level in the response.- Merged views are preferred for endpoints that must include homebrew overlays. Direct SRD tables are used only when homebrew content should be explicitly excluded.
- The spell catalog route uses a fallback chain: it first looks for a merged view (
srd_{edition}_spells_merged), then the plain SRD table (srd_{edition}_spells), and for 2024 requests, falls back to the 2014 tables if no 2024 spell data exists yet.
App Database (lorewright-app.sqlite)
The application database is structurally independent from the SRD database. It stores:
- Characters: identity, metadata, and current state.
- Snapshots: immutable level-up history records. Each successful
applycreates a new snapshot rather than mutating the character record. - Decision plans: client-managed documents that record intended level-up choices before they are committed.
- TL;DR datasets: LLM-generated summaries for spells, feats, features, and character levels, synced from JSONL files via
poetry run lorewright tldr-sync.
The app database is never affected by db-refresh or generate-progressions. It persists across pipeline rebuilds.
Design Rationale
SQLite was chosen over a client-server database (PostgreSQL, MySQL) because the deployment target is a single-machine development tool, not a distributed service. SQLite's zero-configuration setup, file-based storage, and reliable single-writer semantics match the use case well. The tradeoff is limited concurrent write throughput, which is acceptable for the expected workload (single user, infrequent writes).
The JSON-column approach (storing structured data in data TEXT columns rather than fully relational tables) was chosen because the SRD record shapes vary significantly across dataset families. A class record has proficiency choices and spellcasting; a creature record has actions, reactions, and legendary actions; a spell has components and damage scaling. Modeling each family as a fully relational set of tables would produce dozens of tables with complex joins. JSON columns with json_extract() provide flexible querying without schema proliferation.
Storing three JSON representations per row (source, normalized, data) adds storage overhead but eliminates information loss. The source payload enables diffing against upstream SRD updates. The normalized payload captures the exact normalizer output for debugging. The data column provides the stable API contract.
The two-database split (5e-database.sqlite vs lorewright-app.sqlite) enforces a clean boundary between regenerable reference data and irreplaceable user data. The SRD database can be deleted and rebuilt from source JSON at any time. The app database cannot.
Assumptions & Constraints
- SQLite JSON1 extension required: The schema relies heavily on
json_valid()check constraints andjson_extract()queries. SQLite must be compiled with JSON1 support (standard in Python'ssqlite3module since Python 3.9). - Single-writer assumption: The pipeline's destructive refresh (drop + recreate tables) assumes exclusive write access. Running
db-refreshwhile the API server is reading from the same database could produce transient errors. - No foreign keys between databases: The SRD database and app database are separate files. Character records reference class/subclass indexes by string value, not by foreign key. If a progression is removed, stale character references are not automatically detected.
- Table names are derived from file paths: The table naming convention (
srd_{edition}_{dataset}) is computed byengine/ingest/db_utils.pyfrom the source file path. Adding a new SRD dataset file automatically creates a new table—no schema migration needed.
Conceptual Model
The SQLite layer serves as the bridge between raw source content and the API/UI. It occupies a middle position in the data flow:
SRD JSON files → [normalize] → SQLite tables → [generate] → Progression JSON
↑ ↓
Homebrew JSON → [project] → Merged views API / UI
The database is both a queryable cache (for the spell catalog and level-up engine, which query it at runtime) and a staging area (for the progression generator, which reads from it during artifact generation). This dual role means the database schema must serve both real-time query performance and batch generation efficiency.