Introduction to Scorebook

Scorebook is a flexible and extensible framework for evaluating models. It provides clear contracts for data loading, model inference, and metrics computation. Scorebook makes it easy to run comprehensive evaluations across different datasets, models, hyperparameters, and metrics.

In addition, Scorebook integrates with Trismik’s services such as adaptive testing, and the Trismik Dashboard to enable advanced evaluations and streamlined storage, management, and visualization of results.

Throughout this documentation Scorebook's examples are provided as runnable demonstrations of evaluation components, complete implementations, and expected outputs.

Key Scorebook Features

Flexible Data Loading: Built-in support for Hugging Face datasets, CSV, JSON, and Python lists
Model Agnostic: Works with any model or inference provider
Extensible Metric Engine: Use metrics from those included or implement your own
Automated Sweeping: Evaluate combinations of model hyperparameter configurations automatically
Rich Results: Export results to JSON, CSV, or structured formats like pandas DataFrames

Key Scorebook Features​

Key Scorebook Features