SQLite Schema

Canonical runtime storage is data/5e-database.sqlite for SRD content and data/lorewright-app.sqlite for application state (characters, decision plans, TL;DR datasets). These two databases serve fundamentally different purposes and have different lifecycle characteristics.

Table Pattern

  • SRD content tables follow the naming convention srd_{edition}_{dataset} (e.g., srd_2014_classes, srd_2024_spells). Each table has a fixed column schema: row_order, srd_index, name, url, srd_version, data (JSON), imported_at, source_payload (JSON), and normalized_payload (JSON). The data column contains the normalized record used by API queries; source_payload preserves the original SRD JSON for traceability.
  • Homebrew projection tables mirror the SRD table schema but contain records sourced from canonical homebrew JSON files in homebrew/.
  • Merged views (e.g., srd_2014_classes_merged) union SRD rows with homebrew rows, providing a single query surface for API endpoints that need to include both official and custom content. Endpoints default to merged views so that homebrew entities appear alongside SRD entities without special-case routing.
  • Dataset metadata is tracked in the srd_datasets table, which records the table name, dataset name, SRD version, source file path, row count, and import timestamp for every imported dataset. This enables pipeline diagnostics and staleness detection.

Column Design

Every SRD table stores three JSON representations per record:

ColumnPurpose
source_payloadRaw JSON from the SRD file, preserved exactly as ingested. Used for debugging and diffing against upstream changes.
normalized_payloadOutput of the family-specific normalizer. Flat, consistent field names.
dataThe query-facing payload, currently identical to normalized_payload. Decoupled to allow future API-specific projections without changing the normalizer.

The data column has a CHECK (json_valid(data)) constraint, as do source_payload and normalized_payload. This ensures SQLite's json_extract() functions will never encounter malformed JSON at query time.

Indexes are created on srd_index (unique, partial—only where non-null), name, and url for each table. The srd_index is the primary lookup key used by the progression generator and level-up engine.

Query Conventions

  • Application reads rely on the data JSON payload column. All field access goes through json_extract(data, '$.field_name') rather than dedicated columns. This avoids schema migrations when the normalized payload shape evolves—only the normalizer functions need to change.
  • json_extract(...) is used for field-level filtering and sorting in API handlers. For example, the spell catalog route extracts $.level to include spell level in the response.
  • Merged views are preferred for endpoints that must include homebrew overlays. Direct SRD tables are used only when homebrew content should be explicitly excluded.
  • The spell catalog route uses a fallback chain: it first looks for a merged view (srd_{edition}_spells_merged), then the plain SRD table (srd_{edition}_spells), and for 2024 requests, falls back to the 2014 tables if no 2024 spell data exists yet.

App Database (lorewright-app.sqlite)

The application database is structurally independent from the SRD database. It stores:

  • Characters: identity, metadata, and current state.
  • Snapshots: immutable level-up history records. Each successful apply creates a new snapshot rather than mutating the character record.
  • Decision plans: client-managed documents that record intended level-up choices before they are committed.
  • TL;DR datasets: LLM-generated summaries for spells, feats, features, and character levels, synced from JSONL files via poetry run lorewright tldr-sync.

The app database is never affected by db-refresh or generate-progressions. It persists across pipeline rebuilds.

Design Rationale

SQLite was chosen over a client-server database (PostgreSQL, MySQL) because the deployment target is a single-machine development tool, not a distributed service. SQLite's zero-configuration setup, file-based storage, and reliable single-writer semantics match the use case well. The tradeoff is limited concurrent write throughput, which is acceptable for the expected workload (single user, infrequent writes).

The JSON-column approach (storing structured data in data TEXT columns rather than fully relational tables) was chosen because the SRD record shapes vary significantly across dataset families. A class record has proficiency choices and spellcasting; a creature record has actions, reactions, and legendary actions; a spell has components and damage scaling. Modeling each family as a fully relational set of tables would produce dozens of tables with complex joins. JSON columns with json_extract() provide flexible querying without schema proliferation.

Storing three JSON representations per row (source, normalized, data) adds storage overhead but eliminates information loss. The source payload enables diffing against upstream SRD updates. The normalized payload captures the exact normalizer output for debugging. The data column provides the stable API contract.

The two-database split (5e-database.sqlite vs lorewright-app.sqlite) enforces a clean boundary between regenerable reference data and irreplaceable user data. The SRD database can be deleted and rebuilt from source JSON at any time. The app database cannot.

Assumptions & Constraints

  • SQLite JSON1 extension required: The schema relies heavily on json_valid() check constraints and json_extract() queries. SQLite must be compiled with JSON1 support (standard in Python's sqlite3 module since Python 3.9).
  • Single-writer assumption: The pipeline's destructive refresh (drop + recreate tables) assumes exclusive write access. Running db-refresh while the API server is reading from the same database could produce transient errors.
  • No foreign keys between databases: The SRD database and app database are separate files. Character records reference class/subclass indexes by string value, not by foreign key. If a progression is removed, stale character references are not automatically detected.
  • Table names are derived from file paths: The table naming convention (srd_{edition}_{dataset}) is computed by engine/ingest/db_utils.py from the source file path. Adding a new SRD dataset file automatically creates a new table—no schema migration needed.

Conceptual Model

The SQLite layer serves as the bridge between raw source content and the API/UI. It occupies a middle position in the data flow:

SRD JSON files → [normalize] → SQLite tables → [generate] → Progression JSON
                                     ↑                              ↓
                              Homebrew JSON → [project] → Merged views    API / UI

The database is both a queryable cache (for the spell catalog and level-up engine, which query it at runtime) and a staging area (for the progression generator, which reads from it during artifact generation). This dual role means the database schema must serve both real-time query performance and batch generation efficiency.

Related Hubs