Bases#
A base is how Katalyst reaches a backend source and maps that source into the domain model.
Every base must include configuration for raw access. Raw access gives Katalyst a stable way to locate content in the source. For a filesystem, that can be a root directory. For SQL, that can be connection information for a specific instance.
A collectionized base keeps that raw access and adds collection definitions. Those definitions map base-native references into named collections and item identities that Katalyst commands can address directly. This is where two-way mapping applies.
Katalyst’s base model covers filesystem and SQLite backends today and is designed to extend to backends such as Postgres, S3, and hosted APIs.
Terms#
The base model uses several named pieces:
| Term | Meaning |
|---|---|
| Base type | A known backend source kind capable of holding collections and items: filesystem and sqlite today; postgresql, mongodb, and others later. |
| Base instance | A specific, connectable instance of a base type, plus the information needed to reach it. |
| Collection mapping | The two-way mapping from a base instance’s contents to collections and items. One mapping may yield more than one collection. |
| Base reference | A base-native locator: a file path, S3 key, table name, or similar backend address. |
| Coordinates | The captured fields that identify a unit within its collection. |
| Scope | The domain level, item or collection, at which a base type attaches a base’s units to the model. |
In config, a base instance declares the collections it maps, and the instance file is where the collection mapping lives. In code, the implementation seam is internal/storage/collection.CollectionDefinition; internal/project consumes it rather than implementing the filesystem mapping inline.
Base readers use codecs to decode a matched unit’s content into the shape checks and inspectors consume. The markdown filesystem reader uses internal/codec/markdownbodytext for frontmatter/body parsing; codecs are shared content adapters, not base backends.
Collectionized bases use a two-way mapping#
When a base is collectionized, mapping has two directions:
- Forward (discovery):
path -> match pattern -> captured groupsbecome the unit’s coordinates. - Reverse (reconstruction):
coordinates -> fill a template -> path.
The reverse direction is not optional. Katalyst needs it the moment
item add notes/dune has to decide what file to create; that is the same
path-reconstruction problem. Today it is the degenerate, stem-only case
(Reference(c, id) -> <dir>/<id>.md); it grows with the layout.
The scope principle#
“What does one matched source unit become?” has no global answer; it is a property each base type declares.
- Markdown filesystem: one file = one item; a directory of files = a collection.
- Tabular (CSV / SQL): one file/table = one collection; its rows = items.
Both mappings are correct because item and collection are domain roles, not file counts. The definition absorbs that difference: alongside the path-to-coordinates mapping, it declares the scope at which a base’s units attach to the collection/item hierarchy.
Implication: item and collection are roles, not file counts. A base that packs many items into one physical unit (rows in a table) and one that spreads a single item across a whole unit (a markdown file) are both valid.
Base capability stack#
- Raw base: Katalyst can connect to the source and reference base-native content.
- Collectionized base: a raw base plus collection definitions that map base-native references into domain collections and items.
Unmatched references are first-class#
Katalyst treats unmatched references as errors rather than silently dropping them. A file inside a configured collection’s scope that matches no pattern is usually a signal of config drift: the pattern is wrong, the file is misplaced, or the project has gained a new shape that has not been modeled yet.
The same evidence can power a future doctor / explain command: list the collections, show representative examples, and surface the base references that matched nothing.
Variants route checks, not membership#
A collection may run different checks on different items via
variants, but that is
a check-engine concern, not a base one. A variant’s discriminator is a
predicate over an item’s metadata: portable across every base type, since
each yields a metadata map (frontmatter for a file, columns for a row). It never
touches the seam: membership, Unmatched, and Reference stay governed by the
definition’s pattern. Discriminating by path would be a base-type-scoped
condition; it is deferred precisely to keep the seam closed for now.
Coordinates are the selector#
In Katalyst, the flat stem identity is the degenerate one-coordinate case:
notes/dune.md becomes the item id dune. Richer layouts (notes/2020/dune)
grow into multiple coordinates parsed from the path. The selector grammar and
the definition’s pattern are two views of the same thing.
Design rationale#
- The contract is two-way, not one-way. Discovery and reconstruction are both core base operations.
- Raw and collectionized are one progression. A base starts with base-native references, then gains collection definitions that make collection-aware operations possible.
- Surface unmatched references. Silent skips hide drift between the base’s real contents and the configured model.
- Coordinates and selectors are one concept. The fields captured from a base reference should be the same fields users and agents use to address the item.
- Prefer an inherently two-way template (
{name}_{year}.md) over inverting an arbitrary regex. A template is bidirectional by construction; a regex is not. - The pattern must own the file extension, or reconstruction is ambiguous when several extensions are allowed.
- Keep collection identity separate from within-collection coordinates. Collection names and item coordinates answer different questions and should stay distinct.
Extension points#
- Core seam:
internal/storagedefinesBaseType,Scope, andReference;internal/projectassemblesBaseInstancevalues, andinternal/storage/collectiondefinesCollectionDefinition. The filesystem implementation maps a directory to a collection and each*.mdfile to an item with a stem id; the SQLite implementation maps a table to a collection and each row to an item. - Extension point: anything that turns a base-native reference into an item
identity (or back) passes through
CollectionDefinition, so backends can be added without touching the check engine, the CRUD verbs, or selector parsing. Multi-coordinate templates, inferred mode, and additional non-filesystem types slot in there.
See also#
- Domain model for the cross-subsystem entity map.
- Collections for the collection and item hierarchy bases expose.
- Configuration for the
precise
.katalyst/bases/surface. go doc ./internal/storagefor the code-level base contracts.