-
Notifications
You must be signed in to change notification settings - Fork 111
Dataset API proposal
Aleksandr Sorokoumov edited this page Jul 5, 2014
·
13 revisions
As a part of Incanter
and core.matrix
integration process, there is an idea to evolve existing in core.matrix
dataset type and use it in Incanter
.
In order to do that, Incanter
dataset functions should be implemented in core.matrix
.
(column-count ds)
Returns number of columns in the dataset.
(column-names ds)
Returns column names of the dataset.
(column-name ds idx)
Returns column name at given index.
(get-column ds k)
Returns column at given index or name k.
(select-columns ds cols)
Produces a new dataset with the columns in the specified order.
(except-columns ds cols)
Returns new dataset with all columns except specified.
(conj-columns & args)
Returns a dataset created by combining columns of the given datasets and/or collections.
(add-column ds col)
Adds column to the dataset.
(add-derived-column ds col-name from-cols f)
Adds a column to a dataset that is a function of existing columns.
(rename-columns ds col-map)
Renames columns based on map of old new column name pairs.
(replace-column ds col-name vs)
Replaces column in a dataset with new values.
(transform-column ds col-name f & args)
Applies function f & args to the specified column of dataset and replaces the column with the resulting new values.
(get-row ds idx)
Returns row at given index.
(conj-rows & args)
Returns a dataset created by combining the rows of the given datasets and/or collections.
(row-count ds)
Returns number of rows.
(from-matrix m)
Creates dataset from array.
(to-matrix ds)
Creates matrix from dataset.
(from-map m)
Creates dataset from map of columns with associated list of values.
(to-map ds)
Returns map ol columns with associated list of values.
(get-element ds c r)
Returns element at given column and row.
(query-dataset ds query-map
Returns new dataset with rows, which satisfy query-map predicate.
Queries the given dataset using the query-map, returning a new dataset.
The query-map uses the the dataset's column-names as keys and a simple variant of the MongoDB query language.
For instance, given a dataset with two columns, :x and :category, to query for rows where :x equals 10, use the following query-map: {:x 10}.
To indicate that :x should be between 10 and 20, use {:x {:$gt 10 :$lt 20}}.
To indicate that :category should also be either :red, :green, or :blue, use :$in {:x {:$gt 10 :$lt 20} :y {:$in #{:green :blue :red}}}
And to indicate that :category should not include :red, :green, or :blue, use :$nin {:x {:$gt 10 :$lt 20} :y {:$nin #{:green :blue :red}}}
Query terms include :$gt, :$lt, :$gte, :$lte, :$eq, :$ne, :$in, :$nin, $fn.
(group-by ds cols)
Returns a map of datasets, where keys are grouping columns.
(join ds & args)
Returns a dataset created by right-joining two or datasets.