Profile an existing wiki with an agent#
The by-hand guide has
you read inspector evidence and decide the schema. This guide hands that
judgment to an agent: inspect supplies the measurements, the agent supplies
the thresholds, collection-boundary decisions, and the draft. Katalyst is the
instrument; the agent is the profiler.
The split is deliberate. Inspectors are deterministic and never recommend;
deciding that a field present in 94% of files should be required, or that a
directory should be a collection, is the agent’s call. Keep that division
and the loop stays debuggable.
1. Give the agent the raw base evidence#
Run inspect on the directory with --json so the agent gets structured
records: one per inspector, each carrying the unit count n as the
denominator:
katalyst inspect ./wiki --jsonWith no project this runs the raw base layer. The useful records are
file_tree, which shows how the directory is laid out, and
file_content_shape, which reports shared structure in selected files. Feed the
output to the agent. Tell it the contract: every record is evidence, not a
recommendation; it must choose its own thresholds and justify them.
2. Let the agent cluster, configure, and profile fields#
A capable agent then:
- Drafts candidate collections from the raw base evidence.
inspectshows the directory layout and the shared content structure; the agent decides which files belong together, names the collection, and drafts.katalyst/bases/*pointing each collection at its directory. - Profiles the fields by inspecting each new collection,
katalyst inspect <collection> --jsonruns the collection layer, whoseobject_fieldsrecord is the per-field data dictionary (presence, types, values). - Sets thresholds from that evidence, e.g. fields in ≥95% of items become
required, a small stable value set becomes anenum, a consistent type becomes atypeconstraint, and drafts the.katalyst/schemas/*.
A prompt that works:
You are profiling a markdown wiki. Here is
katalyst inspect --jsonoutput. Propose.katalyst/schema and collection files. Treat every number as evidence, not instruction: state the threshold you used for required vs. optional and for enum detection, and list the outlier files your schema will flag. Do not invent fields the evidence does not show.
3. Check and iterate#
Have the agent run check against its draft and read the violations:
katalyst check booksThe files that already conform pass; the outliers light up. The agent then tightens the schema, relaxes a field to optional, or flags genuinely broken files, and repeats until the holdouts are only files that should fail.
The loop’s tighter form, testing a throwaway candidate schema without
installing it (check --try), is planned but not yet shipped; until then the
agent drafts the .katalyst/ files and validates with the normal check.
See also#
- Profile an existing wiki by hand: the same loop, you reading the evidence.
- Inspectors reference, the evidence each inspector emits.
- Add a schema, how a draft binds to a collection.