Batch Inference
Optimizing Performance with Local Batched Inference
Scorebook can support local batched inference. An inference function or pipeline can set a batch size to process multiple evaluation items simultaneously, improving performance and efficiency for local model inference.
Batch Inference Function
def batch_inference(
preprocessed_items: List[Dict[str, Any]], **hyperparameter_config: Any
) -> List[Any]:
"""Run batch inference on multiple preprocessed eval items simultaneously.
Args:
preprocessed_items: List of preprocessed evaluation items.
hyperparameter_config: Model hyperparameters.
Returns:
A list of model outputs for all evaluation items.
"""
# Extract messages from all preprocessed items
all_messages = [item["messages"] for item in preprocessed_items]
# Perform batch inference - the pipeline will process multiple inputs together
batch_outputs = pipeline(
all_messages,
temperature=hyperparameter_config["temperature"],
max_new_tokens=hyperparameter_config.get("max_new_tokens", 256),
do_sample=True,
batch_size=1, # hyperparameter_config.get("batch_size", 1),
)
return list(batch_outputs)
To run and view the result of this evaluation pipeline with batched inference in an evaluation example, run Scorebook's Example 4.