Batch Inference

Optimizing Performance with Local Batched Inference

Scorebook can support local batched inference. An inference function or pipeline can set a batch size to process multiple evaluation items simultaneously, improving performance and efficiency for local model inference.

Batch Inference Function

def batch_inference(
    preprocessed_items: List[Dict[str, Any]], **hyperparameter_config: Any
) -> List[Any]:
    """Run batch inference on multiple preprocessed eval items simultaneously.

    Args:
        preprocessed_items: List of preprocessed evaluation items.
        hyperparameter_config: Model hyperparameters.

    Returns:
        A list of model outputs for all evaluation items.
    """
    # Extract messages from all preprocessed items
    all_messages = [item["messages"] for item in preprocessed_items]

    # Perform batch inference - the pipeline will process multiple inputs together
    batch_outputs = pipeline(
        all_messages,
        temperature=hyperparameter_config["temperature"],
        max_new_tokens=hyperparameter_config.get("max_new_tokens", 256),
        do_sample=True,
        batch_size=1,  # hyperparameter_config.get("batch_size", 1),
    )

    return list(batch_outputs)

To run and view the result of this evaluation pipeline with batched inference in an evaluation example, run Scorebook's Example 4.

Batch Inference Function​

Batch Inference Function