reduce overhead of fit! over update

Currently, if one calls fit! on a machine without changing rows, there is an overhead in calls to `selectrows`, calls which are required because the row selection is not cached to the machine and because `selectrows` is slow. The reason `selectrows` is slow is because rows are selected through the Tables.jl interface which does not support random access to rows. 

This is mostly an issue for an iterative model that one is (externally) tracking as the iteration parameter is increased. See [here](Context: https://github.com/alan-turing-institute/MLJ.jl/issues/122#issuecomment-568308065) for context. In the example the slow down is about 2-3 times. 

Since Tables does not support the fast access (although do see the discussion at https://github.com/JuliaData/Tables.jl/issues/123 ) it seems we need to cache (references to) data selections in the machines, and not just previous row indices. For a regular machine this doesn't seem too innocuous, but for nodal machines in a multi-model learning network, this could be limiting for big data sets. Currently learning network nodes do not cache any data. We could make caching optional, but I'm not sure what is the best interface point for this.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reduce overhead of fit! over update #151

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

reduce overhead of fit! over update #151

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions