Skip to content

Fix map() example in datasets documentation: define tokenizer before use #7704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Sanjaykumar030
Copy link

Problem

The current datasets.Dataset.map() example in the documentation demonstrates batched processing using a tokenizer object without defining or importing it. This causes a NameError when users copy and run the example as-is, breaking the expected seamless experience.

Correction

This PR fixes the issue by explicitly importing and initializing the tokenizer using the Transformers library (AutoTokenizer.from_pretrained("bert-base-uncased")), making the example self-contained and runnable without errors.
This will help new users understand the workflow and apply the method correctly.

Closes #7703

@Sanjaykumar030
Copy link
Author

Hi @lhoestq, just a gentle follow-up on this doc fix PR (#7704). Let me know if any changes are needed — happy to update.
Hope this improvement helps users run the example without confusion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Docs] map() example uses undefined tokenizer — causes NameError
1 participant