Thank you for open-sourcing your work — it’s greatly appreciated.
I’ve noticed a discrepancy between the training and inference configurations:
- The MAX_SEQ_LEN is set to 2048 during training
- But defaults to 4096 during inference
Could you please clarify the reasoning behind this difference?