Skip to content

BucketIterator cannot actually batch examples of similar lengths together unless sort_within_batch=True #1097

Open
@shuaihuaiyi

Description

@shuaihuaiyi

🐛 Bug

Describe the bug

def pool(data, batch_size, key, batch_size_fn=lambda new, count, sofar: count,

According to the pool function, especially line 291 to 293, it will not sort examples in a chunk unless sort_within_batch=True. When sort_within_batch=False, it will only act like Iterator. I have tested BucketIterator with sort_within_batch=False, seems like the generated batches are indeed of same length with many <pad>

To Reproduce
Steps to reproduce the behavior:
You don't need to

Expected behavior
it should batch examples of similar lengths together

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions