Inference

Flexible Inference Implementation for Scorebook Evaluations

Scorebook's signature evaluate function requires a callable argument for its inference parameter. This is the function used to encapsulate a model's inference process and generate a list of predictions for a list of input evaluation items.

Inference Callable Requirements

All inference callables in Scorebook must share the same basic contract, regardless of its implementation approach:

An inference callable must:

Accept a list of evaluation items
Accept hyperparameters as kwargs
Return a list of model outputs for scoring

Inference Callable Implementations

This flexibility allows inference to be implemented using any callable type, such as functions, methods, classes, or objects.

Inference Functions

The most straightforward approach is defining a single function that handles the entire inference process:

# A basic inference function implementation
def inference(evaluation_items: List[Dict[str, Any]], **hyperparameters):

    predictions = []
    for item in items:
        model = get_model()
        model.temperature = hyperparameters.get("temperature")
        prediction = model.predict(item["question"])
        predictions.append(prediction)
        
    return predictions

Advanced Inference Implementations

As projects grow, inference functions can be expanded into more modular and reusable components. Instead of handling all logic in a single function, you can compose inference using Scorebook’s InferencePipeline.

from scorebook import InferencePipeline

# Create an inference pipeline
inference_pipeline = InferencePipeline(
    model = "model-name",            # Optionally specify the model name
    preprocessor = preprocessor,     # Prepares evaluation items for model input
    inference_function = inference,  # Generates raw model output for structured inputs
    postprocessor = postprocessor,   # Parses model outputs to extract the response for scoring
)

Pipelines let you break the process into distinct stages of preprocessing, inference, and postprocessing, making it easier to manage complexity, reuse components, or plug in different models.

Additionally, Scorebook's flexibility allows for inference utilizing: In practice, more advanced scenarios often use one of the following approaches:

Asynchronous Inference: Run inference asynchronously
Batch Inference: Improve performance with local models
Cloud Inference: Integrate with providers such as OpenAI or Anthropic

Each of these builds on the same callable contract described above

Next Steps

For simple use cases: Start with basic inference functions as shown above
For modular, reusable code: Explore Inference Pipelines
For performance optimization: Check out Batch Inference
For cloud integration: See Cloud Inference

Inference Callable Requirements​

Inference Callable Implementations​

Inference Functions​

Advanced Inference Implementations​

Next Steps​

Inference Callable Requirements

Inference Callable Implementations

Inference Functions

Advanced Inference Implementations

Next Steps