
Scorebook is a flexible and extensible framework for evaluating models. It provides clear contracts for data loading, model inference, and metrics computation. Scorebook makes it easy to run comprehensive evaluations across different datasets, models, hyperparameters, and metrics.
In addition, Scorebook integrates with Trismik’s services such as adaptive testing, and the Trismik Dashboard to enable advanced evaluations and streamlined storage, management, and visualization of results.
Throughout this documentation Scorebook's examples are provided as runnable demonstrations of evaluation components, complete implementations, and expected outputs.
Key Scorebook Features
- Flexible Data Loading: Built-in support for Hugging Face datasets, CSV, JSON, and Python lists
- Model Agnostic: Works with any model or inference provider
- Extensible Metric Engine: Use metrics from those included or implement your own
- Automated Sweeping: Evaluate combinations of model hyperparameter configurations automatically
- Rich Results: Export results to JSON, CSV, or structured formats like pandas DataFrames