Profile an existing wiki by hand#
You have a directory of markdown: a vault, a docs tree, a knowledge base,
and you want a Katalyst schema for it. Rather than guess the conventions,
inspect measures them. This guide turns an existing corpus into a draft
schema by reading the evidence yourself. To hand that judgment to an agent
instead, see Profile an existing wiki with an
agent.
inspect reports evidence, counts and distributions, never
recommendations. Reading the evidence and deciding the schema is your call. It
runs in two layers: point it at a directory to profile a raw base
(no project needed), or at a configured collection to profile its items.
The onboarding loop uses both.
1. Survey the directory (raw base layer)#
Point inspect at the directory. With no .katalyst/ project it runs the
raw base inspectors:
katalyst inspect ./wikifile_tree reports the file types and naming conventions per directory. Use it
to decide which directory or prefix you want to inspect more closely. Then run
file_content_shape over that explicit slice:
# Inspection report: ./wiki
## Structural
### file_content_shape (n=5)
_Profile selected files by text, tabular, and tree content structure._
----------------------------------------
selection:
expression : ext = ".md"
files : 5
directories : 1
readable : 5
unsupported : 0
parse failures: 0
----------------------------------------
file types:
TYPE FILES
.md 5
----------------------------------------
coherence:
status: coherent
----------------------------------------
common structure:
- 5/5 Markdown files have an H1
- 4/5 Markdown files have frontmatter key author
- 5/5 Markdown files have frontmatter key status
- 5/5 Markdown files have frontmatter key title
- 4/5 Markdown files have section Review
----------------------------------------
variation:
- frontmatter key author appears in 4/5 Markdown files
----------------------------------------
text:
files : 5
with H1: 5
frontmatter keys:
KEY FILES
status 5
title 5
author 4
----------------------------------------
tabular:
no CSV files selected
----------------------------------------
tree:
no JSON files selected
----------------------------------------
read/parse issues:
none
This layer reports store and content facts, not candidate collections. Here the
Markdown files share enough structure that you can reasonably treat ./wiki as
a single books collection and keep the file with the missing author in mind
as cleanup work.
2. Configure the collection#
Point a collection at the directory so the field-level layer can run. Minimal config:
# .katalyst/bases/local.yaml
type: filesystem
root: .
collections:
books:
path: wiki3. Inspect the collection (collection layer)#
Now inspect the collection by name. Inside the project, inspect runs the
collection inspectors over its items:
katalyst inspect booksobject_fields is a data dictionary over the items’ frontmatter, per
field, presence over n, observed types, value cardinality, and the common
values when the set is small:
# Inspection report: books
## Object
### object_fields (n=5)
_A data dictionary over item frontmatter: per-field presence, types, cardinality, and common values._
- author:
- cardinality: 4
- present: 4
- types:
- string: 4
- values:
- Frank Herbert: 1
- Isaac Asimov: 1
- Neal Stephenson: 1
- William Gibson: 1
- status:
- cardinality: 3
- present: 5
- types:
- string: 5
- values:
- read: 3
- reading: 1
- to-read: 1
- title:
- cardinality: 5
- present: 5
- types:
- string: 5
- values:
- Dune: 1
- Dune Messiah: 1
- Foundation: 1
- Neuromancer: 1
- Snow Crash: 1
markdown_body reports the body conventions: single-H1 / H1-matches-title rates
and recurring section headings. For a machine-readable form, add --json; to
save the report, use -o report.md.
4. Read the evidence#
Translate the counts into schema decisions yourself, the threshold is your judgment, not the tool’s:
| Evidence | What it tells you | A reasonable reading |
|---|---|---|
object_fields present / n | how often a field appears | nearly every item → required; sometimes → optional |
object_fields values | a small, stable value set | an enum |
object_fields types | observed types per field | one consistent type → a type constraint; mixed → a field to clean up first |
markdown_body heading shape | single-H1, H1-matches-title | markdown_single_h1, markdown_title_matches_h1 |
markdown_body sections | recurring section headings | a markdown_required_section |
file_tree naming (step 1) | casing, spaces, extensions | filesystem_name_case (style: kebab), filesystem_path_charset (deny: [" "]) |
file_content_shape common structure (step 1) | shared frontmatter keys and sections in the selected slice | confidence that the slice is coherent enough to configure as one collection |
The denominator n is always reported, so you decide what “nearly every item”
means. The one item missing author, which also has spaces in its name, is
exactly the kind of file a schema will flag.
5. Draft a schema and check#
Add the schema and bind it to the collection:
# .katalyst/schemas/book.yaml
type: object
required: [title, author, status]
properties:
title: { type: string }
author: { type: string }
status: { enum: [read, reading, to-read] }# .katalyst/bases/local.yaml (extend the collection from step 2)
type: filesystem
root: .
collections:
books:
path: wiki
schema: book
checks:
- kind: markdown_single_h1
- kind: filesystem_name_case
style: kebabSee Add a schema for the binding details.
Then run check against the draft:
katalyst check booksThe files that already follow the conventions pass; the outliers the evidence flagged light up as violations. From there you tighten the schema, relax a field to optional, or fix the stray files, then re-run. That loop, inspect → draft → check → fix the holdouts, is the whole onboarding.
See also#
- Profile an existing wiki with an agent: the same loop, driven by an agent.
- Inspectors reference, every inspector and what it reports.
- Add a schema, bind the draft to a collection.