Proposal summary
When we are running multiple optimizations like with GEPA, it uses a pareto front which has an eval with holdout / validation so scores improve internally. Sometimes the "best" score in the SDK !== to the UI. and sometimes we might not want just the highest score.
We should be able to control from the SDK (if we wish) which trial is the "best". Example of an issue.
Motivation
Fine grain control to ensure SDK is matching results in the UI. As well as custom scores metrics with weighted outcomes might not always be the highest score we want.