Skip to content

Conversation

tsibley
Copy link
Contributor

@tsibley tsibley commented Feb 20, 2020

Nice tool, I just want to use it on TSV too. See commit messages for details. :-)

The csv module documentation says this is necessary in some cases, such
as for parsing embedded newlines in quoted fields.  Refer to the
<https://docs.python.org/3/library/csv.html#id3>.
For seekable streams, the delimiter is sniffed from the first 1MB of
data.  This should provide enough rows to the sniffer even for datasets
with very long rows without blowing up memory usage much.

A csv.Dialect may also be specified directly to load_csv() for
programmatic usage.
Useful when you want to disable sniffing or when one or both of the
files aren't seekable, so sniffing doesn't work.
@simonw
Copy link
Owner

simonw commented Feb 29, 2020

I love this improvement - thanks for teaching me about the csv.Sniffer mechanism!

@simonw simonw merged commit 140fe0d into simonw:master Feb 29, 2020
simonw added a commit that referenced this pull request Feb 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants