@hackage / shikumi-eval

Typed evaluation framework for shikumi LM programs (EP-8)

Latest0.1.0.0

About

Metadata

  • Last updated , by shinzui
  • License BSD-3-Clause
  • Maintained by: nadeem@gmail.com

  • Lottery factor: 1

Links

Installation

Readme

The evaluation framework for shikumi: the owned data model (Example, Prediction, Dataset, Metric, Score, Report — MasterPlan integration point #5), built-in pure and LM-backed metrics, an evaluate runner that scores a Shikumi.Program.Program over a typed dataset with bounded parallelism and per-example error boundaries, and golden testing that pins a program's behaviour deterministically under a mock or replayed LM.