Hi, I’m interested in how the PPL metric changes throughout training. What is the lowest perplexity the model can achieve?