Evaluations
Comprehensive Model Evaluation with Scorebook
Scorebook's core evaluate
function provides a flexible framework for assessing model performance across datasets,
hyperparameters, and metrics. Whether you're running simple accuracy checks or complex adaptive evaluations,
Scorebook handles the orchestration while giving you full control over the evaluation process.
Basic Evaluation Structure
All Scorebook evaluations follow the same fundamental pattern:
from scorebook import evaluate, EvalDataset
from scorebook.metrics import Accuracy
# Basic evaluation
results = evaluate(
inference=my_inference_function,
datasets=my_dataset,
hyperparameters={"temperature": 0.7},
)
The evaluate function requires:
- Inference callable: Function that generates model predictions
- Datasets: One or more evaluation datasets with associated metrics
- Optional parameters: Hyperparameters, experiment tracking, result formatting
Next Steps
- For hyperparameter optimization: Explore Hyperparameters
- For metric customization: Check out Metric Scoring
- For result analysis: See Results
- For team collaboration: Learn about Uploading Results
- For efficiency gains: Try Adaptive Evaluations