At Trismik, we're rethinking how large language models are evaluated. Our adaptive testing, inspired by psychometrics, delivers more experiments and deeper insights at no extra cost - making every evaluation fast, rigorous, and trustworthy.
Trismik began at the University of Cambridge, where Professor Nigel Collier - one of the world's most cited NLP researchers - saw first-hand how time-consuming it was to create evaluation datasets, re-run benchmarks, and make sense of complex results.
Curiosity led him to Computerized Adaptive Testing (CAT), a method from psychometrics that evaluates people by selecting only the most informative questions. He wondered: what if AI could be assessed in the same efficient and fair way as humans? That idea became the foundation of Trismik's proprietary adaptive evaluation approach.
In 2023, Nigel met co-founder Rebekka Mikkola, a repeat founder and former Salesforce enterprise sales executive with a passion for building in AI. Backed early by Cambridge Enterprise and in a design partnership with a major UK telco, they built Trismik's first MVP.
In 2025, Nigel's former postdoc from Cambridge and ex-Amazon scientist, Marco Basaldella, joined as CTO, completing a founding team that blends science, engineering, and commercial expertise.
We're preparing for our first product launch in September 2025, introducing both classical and adaptive evaluation with a platform designed to help researchers and practitioners interrogate experiments with clarity and confidence.
In early 2026, we'll extend this platform to enterprise customers and enable secure evaluations on users' own proprietary data, with advanced experiment tracking and rich visualisation. As part of this roadmap, we are working towards SOC 2 compliance to meet the highest standards of security and trust.
Looking further ahead, we see Trismik as the leading platform for rigorous LLM experimentation - trusted by scientists, engineers, and industry practitioners alike. Our goal is to provide a flexible test delivery platform that supports multiple modes of evaluation, evolving from classical and adaptive methods into advanced, domain-specific approaches as the field matures.