Question on MAX_SEQ_LEN Configuration: 2048 (Train) vs 4096 (Inference)

Thank you for open-sourcing your work — it’s greatly appreciated.

I’ve noticed a discrepancy between the training and inference configurations:

1. The MAX_SEQ_LEN is set to 2048 during training
2. But defaults to 4096 during inference

Could you please clarify the reasoning behind this difference?