Fix map() example in datasets documentation: define tokenizer before use #7704

Sanjaykumar030 · 2025-07-26T14:18:17Z

Problem

The current datasets.Dataset.map() example in the documentation demonstrates batched processing using a tokenizer object without defining or importing it. This causes a NameError when users copy and run the example as-is, breaking the expected seamless experience.

Correction

This PR fixes the issue by explicitly importing and initializing the tokenizer using the Transformers library (AutoTokenizer.from_pretrained("bert-base-uncased")), making the example self-contained and runnable without errors.
This will help new users understand the workflow and apply the method correctly.

Closes #7703

…rror

Sanjaykumar030 · 2025-08-01T13:48:35Z

Hi @lhoestq, just a gentle follow-up on this doc fix PR (#7704). Let me know if any changes are needed — happy to update.
Hope this improvement helps users run the example without confusion!

Fix map() example by adding tokenizer initialization to prevent NameE…

3249f3f

…rror

Sanjaykumar030 mentioned this pull request Jul 27, 2025

[Docs] map() example uses undefined tokenizer — causes NameError #7703

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix map() example in datasets documentation: define tokenizer before use #7704

Fix map() example in datasets documentation: define tokenizer before use #7704

Uh oh!

Sanjaykumar030 commented Jul 26, 2025

Uh oh!

Sanjaykumar030 commented Aug 1, 2025

Uh oh!

Uh oh!

Fix map() example in datasets documentation: define tokenizer before use #7704

Are you sure you want to change the base?

Fix map() example in datasets documentation: define tokenizer before use #7704

Uh oh!

Conversation

Sanjaykumar030 commented Jul 26, 2025

Problem

Correction

Uh oh!

Sanjaykumar030 commented Aug 1, 2025

Uh oh!

Uh oh!