I'm sharing Steward, an open-source governance engine for evaluating AI outputs against human-authored contracts.
The goal: deterministic, traceable verdicts (PROCEED / ESCALATE / BLOCKED) with evidence-backed accountability.
Crate structure:
steward-core → Deterministic evaluation (zero LLM calls)
steward-runtime → Optional async LLM orchestration
steward-cli → Pipe-friendly CLI
Key design decisions:
- Parallel lenses with tokio::join! — Five independent evaluators fan-out, synthesizer fans-in. Each lens implements a Lens trait; runtime uses LensAgent (async). Reasonable pattern or overengineered?
- BTreeMap for determinism — All config maps use BTreeMap + LensType derives Ord to guarantee iteration order. Worth the overhead vs IndexMap?
- ProviderFactory trait for LLM providers — Dynamic registration via ProviderRegistry instead of a ProviderType enum. Adds complexity but avoids enum bloat as providers grow.
- Hard boundary: no LLM in steward-core — The synthesizer is a strict policy machine. All LLM assistance lives in steward-runtime. Is this separation clear from the API?
- Fallback chain — On LLM failure: Cache → Simpler Model → Deterministic (core) → Escalate. Configured via Vec. Suggestions on the pattern?
Also includes:
- Python bindings (PyO3/maturin)
- Node.js bindings (napi-rs)
- Domain packs: healthcare, finance, legal, education, HR
Repo: Steward GitHub
This is a work in progress—contributors welcome.
Happy to receive any feedback.
Thank you!