Skip to content

reduce overhead of fit! over update #151

@ablaom

Description

@ablaom

Currently, if one calls fit! on a machine without changing rows, there is an overhead in calls to selectrows, calls which are required because the row selection is not cached to the machine and because selectrows is slow. The reason selectrows is slow is because rows are selected through the Tables.jl interface which does not support random access to rows.

This is mostly an issue for an iterative model that one is (externally) tracking as the iteration parameter is increased. See [here](Context: JuliaAI/MLJ.jl#122 (comment)) for context. In the example the slow down is about 2-3 times.

Since Tables does not support the fast access (although do see the discussion at JuliaData/Tables.jl#123 ) it seems we need to cache (references to) data selections in the machines, and not just previous row indices. For a regular machine this doesn't seem too innocuous, but for nodal machines in a multi-model learning network, this could be limiting for big data sets. Currently learning network nodes do not cache any data. We could make caching optional, but I'm not sure what is the best interface point for this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions