Skip to content

Dataset API proposal

Aleksandr Sorokoumov edited this page Jul 5, 2014 · 13 revisions

Motivation

As a part of Incanter and core.matrix integration process, there is an idea to evolve existing in core.matrix dataset type and use it in Incanter.

In order to do that, Incanter dataset functions should be implemented in core.matrix.

API

column-count

(column-count ds)

Returns number of columns in the dataset.

column-names

(column-names ds)

Returns column names of the dataset.

column-name

(column-name ds idx)

Returns column name at given index.

get-column

(get-column ds k)

Returns column at given index or name k.

select-columns

(select-columns ds cols)

Produces a new dataset with the columns in the specified order.

except-columns

(except-columns ds cols)

Returns new dataset with all columns except specified.

conj-columns

(conj-columns & args)

Returns a dataset created by combining columns of the given datasets and/or collections.

add-column

(add-column ds col)

Adds column to the dataset.

add-derived-column

(add-derived-column ds col-name from-cols f)

Adds a column to a dataset that is a function of existing columns.

rename-columns

(rename-columns ds col-map)

Renames columns based on map of old new column name pairs.

replace-column

(replace-column ds col-name vs)

Replaces column in a dataset with new values.

transform-column

(transform-column ds col-name f & args)

Applies function f & args to the specified column of dataset and replaces the column with the resulting new values.

get-row

(get-row ds idx)

Returns row at given index.

conj-rows

(conj-rows & args)

Returns a dataset created by combining the rows of the given datasets and/or collections.

row-count

(row-count ds)

Returns number of rows.

from-matrix

(from-matrix m)

Creates dataset from array.

to-matrix

(to-matrix ds)

Creates matrix from dataset.

from-map

(from-map m)

Creates dataset from map of columns with associated list of values.

to-map

(to-map ds)

Returns map ol columns with associated list of values.

get-element

(get-element ds c r)

Returns element at given column and row.

query-dataset

(query-dataset ds query-map

Returns new dataset with rows, which satisfy query-map predicate.
Queries the given dataset using the query-map, returning a new dataset.
The query-map uses the the dataset's column-names as keys and a simple variant of the MongoDB query language.

For instance, given a dataset with two columns, :x and :category, to query for rows where :x equals 10, use the following query-map: {:x 10}.

To indicate that :x should be between 10 and 20, use {:x {:$gt 10 :$lt 20}}.

To indicate that :category should also be either :red, :green, or :blue, use :$in {:x {:$gt 10 :$lt 20} :y {:$in #{:green :blue :red}}}

And to indicate that :category should not include :red, :green, or :blue, use :$nin {:x {:$gt 10 :$lt 20} :y {:$nin #{:green :blue :red}}}

Query terms include :$gt, :$lt, :$gte, :$lte, :$eq, :$ne, :$in, :$nin, $fn.

group-by

(group-by ds cols)

Returns a map of datasets, where keys are grouping columns.

join

(join ds & args)

Returns a dataset created by right-joining two or datasets.
Clone this wiki locally