Skip to content

Dataset API proposal

Aleksandr Sorokoumov edited this page Jul 5, 2014 · 13 revisions

Motivation

As a part of Incanter and core.matrix integration process, there is an idea to evolve existing in core.matrix dataset type and use it in Incanter.

In order to do that, Incanter dataset functions should be implemented in core.matrix.

API

column-names

(column-names ds)

Returns a persistent vector containing column names in the same order as they are placed in the dataset.

column-name

(column-name ds idx)

Returns column name at given index.

select-columns

(select-columns ds cols)

Produces a new dataset with the columns in the specified order. cols is a collection of column names to be 

except-columns

(except-columns ds cols)

Returns new dataset with all columns except specified.

merge-columns

(merge-columns & args)

Returns a dataset created by combining columns of the given datasets.

add-column

(add-column ds col)

Adds column to the dataset.

rename-columns

(rename-columns ds col-map)

Renames columns based on map of old new column name pairs.

replace-column

(replace-column ds col-name vs)

Replaces column in a dataset with new values.

update-column

(update-column ds col-name f & args)

Applies function f & args to the specified column of dataset and replaces the column with the resulting new values.

get-row

(get-row ds idx)

Returns row at given index.

conj-rows

(conj-rows & args)

Returns a dataset created by combining the rows of the given datasets and/or collections.

from-matrix

(from-matrix m)

Creates dataset from array.

to-matrix

(to-matrix ds)

Creates matrix from dataset.

from-map

(from-map m)

Creates dataset from map of columns with associated list of values.

to-map

(to-map ds)

Returns map of columns with associated list of values.

get-element

(get-element ds c r)

Returns element at given column and row.

group-by

(group-by ds cols)

Returns a map of datasets, where keys are grouping columns.

join

(join ds & args)

Returns a dataset created by right-joining two or datasets.
Clone this wiki locally