Skip to content

Simplify csv reader/writer #97

@ghost

Description

The stumbling stone for many users is to create the proper SQL table structure before CSV data is loaded.

If the CSV file contains a header, then the database table creation can be based on a simple analysis of the header line and a sample of the remaining rows. Consider a simple wrapper around the SQL CREATE + COPY (INTO/TO) action.

monetdbe.read_csv(, sep=',', quotechar='"', null='')
The parallel MonetDB CSV reader should make a difference compared to plain Python and Pandas

It could also be aligned with the pandas.read_csv() interface
data = pd.read_csv(
"data/files/complex_data_example.tsv", # relative python path to subdirectory
sep='\t' # Tab-separated value file.
quotechar="'", # single quote allowed as quote character
dtype={"salary": int}, # Parse the salary column as an integer
usecols=['name', 'birth_date', 'salary']. # Only load the three columns specified.
parse_dates=['birth_date'], # Intepret the birth_date column as a date
skiprows=10, # Skip the first 10 rows of the file
na_values=['.', '??'] # Take any '.' or '??' values as NA
)

The complimentary action is monetdb.write_csv()

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions