Skip to main content

Uploading Results

Uploading Evaluation Results to Trismik's Dashboard

After running evaluations with Scorebook, you can upload your results to the Trismik platform for centralized tracking, analysis, and collaboration. This enables you to visualize performance trends, compare different models, and share results with your team.

Prerequisites

Before uploading results to Trismik, you need:

  1. Valid Trismik API credentials - Get your API key from the Trismik dashboard
  2. A Trismik project - Create a project on the Trismik dashboard to organize your evaluations
  3. Authentication setup - Configure your API key either via environment variable or login

Authentication

Set your API key as an environment variable:

export TRISMIK_API_KEY="your-api-key-here"

Login Function

Alternatively, login programmatically using the login() function:

import os
from scorebook import login

api_key = os.environ.get("TRISMIK_API_KEY")
login(api_key)

The login() function saves your API key locally for future use. You only need to call it once per environment.


Uploading Results

Automatic Upload

When authenticated, you can enable automatic result uploads by providing experiment_id and project_id to the evaluate() function:

from scorebook import evaluate, EvalDataset
from scorebook.metrics import Accuracy

# Setup your inference function
def my_inference(eval_items, **hyperparameters):
# Your inference logic here
pass

# Load your dataset
dataset = EvalDataset.from_json(
file_path="path/to/dataset.json",
label="answer",
metrics=Accuracy
)

# Run evaluation with automatic upload
results = evaluate(
inference=my_inference,
datasets=dataset,
hyperparameters={"temperature": 0.7},
experiment_id="my-experiment", # Creates experiment if it doesn't exist
project_id="your-project-id", # Must exist on Trismik dashboard
metadata={"model": "gpt-4", "version": "1.0"},
return_items=True
)

Manual Upload Control

You can explicitly control result uploading with the upload_results parameter:

# Force upload even without experiment_id/project_id
results = evaluate(
inference=my_inference,
datasets=dataset,
upload_results=True,
project_id="your-project-id"
)

# Disable upload even when authenticated
results = evaluate(
inference=my_inference,
datasets=dataset,
upload_results=False
)

# Auto mode (default) - uploads if authenticated and IDs provided
results = evaluate(
inference=my_inference,
datasets=dataset,
upload_results="auto" # This is the default
)

Understanding Upload Behavior

ConditionUpload Behavior
upload_results=True + authenticatedAlways uploads
upload_results=True + not authenticatedupload Fails
upload_results=FalseNever uploads
upload_results="auto" + authenticated + IDs providedUploads automatically
upload_results="auto" + not authenticatedNo upload

Metadata and Organization

Experiment Organization

  • Projects: Top-level containers for related experiments
  • Experiments: Specific evaluation campaigns (created automatically if they don't exist)
  • Runs: Individual evaluation executions with specific hyperparameters

Adding Metadata

Include relevant metadata to enhance result tracking:

metadata = {
"model": "microsoft/Phi-4-mini-instruct",
"version": "1.2.0",
"dataset_version": "v2",
"notes": "Testing new prompt template",
"environment": "production"
}

results = evaluate(
inference=my_inference,
datasets=dataset,
experiment_id="prompt-optimization",
project_id="your-project-id",
metadata=metadata
)

Verification

After uploading, verify your results appear on the Trismik dashboard:

  1. Navigate to your project
  2. Check the experiment list
  3. View individual run details and metrics

For a complete runnable example, see Scorebook Example 8 which demonstrates the full upload workflow.