A Python package for analysing data from Inspect logs using statistical modeling techniques presented in HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics and Skewed Score: A Statistical Framework to Assess Autograders.
Please see the docs for installation instructions, examples and a breakdown of the main features.
@article{luettgau2025hibayeshierarchicalbayesianmodeling,
title={HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics},
author={Lennart Luettgau and Harry Coppock and Magda Dubois and Christopher Summerfield and Cozmin Ududec},
year={2025},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2505.05602},
}