Skip to main content

Inference

Flexible Inference Implementation for Scorebook Evaluations

Scorebook's signature evaluate function requires a callable argument for its inference parameter. This is the function used to encapsulate a model's inference process and generate a list of predictions for a list of input evaluation items.


Inference Callable Requirements

All inference callables in Scorebook must share the same basic contract, regardless of its implementation approach:

An inference callable must:

  • Accept a list of evaluation items
  • Accept hyperparameters as kwargs
  • Return a list of model outputs for scoring

Inference Callable Implementations

This flexibility allows inference to be implemented using any callable type, such as functions, methods, classes, or objects.

Inference Functions

The most straightforward approach is defining a single function that handles the entire inference process:

# A basic inference function implementation
def inference(evaluation_items: List[Dict[str, Any]], **hyperparameters):

predictions = []
for item in items:
model = get_model()
model.temperature = hyperparameters.get("temperature")
prediction = model.predict(item["question"])
predictions.append(prediction)

return predictions

Advanced Inference Implementations

As projects grow, inference functions can be expanded into more modular and reusable components. Instead of handling all logic in a single function, you can compose inference using Scorebook’s InferencePipeline.

from scorebook import InferencePipeline

# Create an inference pipeline
inference_pipeline = InferencePipeline(
model = "model-name", # Optionally specify the model name
preprocessor = preprocessor, # Prepares evaluation items for model input
inference_function = inference, # Generates raw model output for structured inputs
postprocessor = postprocessor, # Parses model outputs to extract the response for scoring
)

Pipelines let you break the process into distinct stages of preprocessing, inference, and postprocessing, making it easier to manage complexity, reuse components, or plug in different models.

Additionally, Scorebook's flexibility allows for inference utilizing: In practice, more advanced scenarios often use one of the following approaches:

  • Asynchronous Inference: Run inference asynchronously
  • Batch Inference: Improve performance with local models
  • Cloud Inference: Integrate with providers such as OpenAI or Anthropic

Each of these builds on the same callable contract described above


Next Steps