Profile an existing wiki with an agent#

The by-hand guide has you read inspector evidence and decide the schema. This guide hands that judgment to an agent: inspect supplies the measurements, the agent supplies the thresholds, collection-boundary decisions, and the draft. Katalyst is the instrument; the agent is the profiler.

The split is deliberate. Inspectors are deterministic and never recommend; deciding that a field present in 94% of files should be required, or that a directory should be a collection, is the agent’s call. Keep that division and the loop stays debuggable.

1. Give the agent the raw base evidence#

Run inspect on the directory with --json so the agent gets structured records: one per inspector, each carrying the unit count n as the denominator:

katalyst inspect ./wiki --json

With no project this runs the raw base layer. The useful records are file_tree, which shows how the directory is laid out, and file_content_shape, which reports shared structure in selected files. Feed the output to the agent. Tell it the contract: every record is evidence, not a recommendation; it must choose its own thresholds and justify them.

2. Let the agent cluster, configure, and profile fields#

A capable agent then:

  1. Drafts candidate collections from the raw base evidence. inspect shows the directory layout and the shared content structure; the agent decides which files belong together, names the collection, and drafts .katalyst/bases/* pointing each collection at its directory.
  2. Profiles the fields by inspecting each new collection, katalyst inspect <collection> --json runs the collection layer, whose object_fields record is the per-field data dictionary (presence, types, values).
  3. Sets thresholds from that evidence, e.g. fields in ≥95% of items become required, a small stable value set becomes an enum, a consistent type becomes a type constraint, and drafts the .katalyst/schemas/*.

A prompt that works:

You are profiling a markdown wiki. Here is katalyst inspect --json output. Propose .katalyst/ schema and collection files. Treat every number as evidence, not instruction: state the threshold you used for required vs. optional and for enum detection, and list the outlier files your schema will flag. Do not invent fields the evidence does not show.

3. Check and iterate#

Have the agent run check against its draft and read the violations:

katalyst check books

The files that already conform pass; the outliers light up. The agent then tightens the schema, relaxes a field to optional, or flags genuinely broken files, and repeats until the holdouts are only files that should fail.

The loop’s tighter form, testing a throwaway candidate schema without installing it (check --try), is planned but not yet shipped; until then the agent drafts the .katalyst/ files and validates with the normal check.

See also#