Skip to main content

Batch Inference

Optimizing Performance with Local Batched Inference

Scorebook can support local batched inference. An inference function or pipeline can set a batch size to process multiple evaluation items simultaneously, improving performance and efficiency for local model inference.


Batch Inference Function

def batch_inference(
preprocessed_items: List[Dict[str, Any]], **hyperparameter_config: Any
) -> List[Any]:
"""Run batch inference on multiple preprocessed eval items simultaneously.

Args:
preprocessed_items: List of preprocessed evaluation items.
hyperparameter_config: Model hyperparameters.

Returns:
A list of model outputs for all evaluation items.
"""
# Extract messages from all preprocessed items
all_messages = [item["messages"] for item in preprocessed_items]

# Perform batch inference - the pipeline will process multiple inputs together
batch_outputs = pipeline(
all_messages,
temperature=hyperparameter_config["temperature"],
max_new_tokens=hyperparameter_config.get("max_new_tokens", 256),
do_sample=True,
batch_size=1, # hyperparameter_config.get("batch_size", 1),
)

return list(batch_outputs)

To run and view the result of this evaluation pipeline with batched inference in an evaluation example, run Scorebook's Example 4.