-
Notifications
You must be signed in to change notification settings - Fork 43
Description
Currently, if one calls fit! on a machine without changing rows, there is an overhead in calls to selectrows
, calls which are required because the row selection is not cached to the machine and because selectrows
is slow. The reason selectrows
is slow is because rows are selected through the Tables.jl interface which does not support random access to rows.
This is mostly an issue for an iterative model that one is (externally) tracking as the iteration parameter is increased. See [here](Context: JuliaAI/MLJ.jl#122 (comment)) for context. In the example the slow down is about 2-3 times.
Since Tables does not support the fast access (although do see the discussion at JuliaData/Tables.jl#123 ) it seems we need to cache (references to) data selections in the machines, and not just previous row indices. For a regular machine this doesn't seem too innocuous, but for nodal machines in a multi-model learning network, this could be limiting for big data sets. Currently learning network nodes do not cache any data. We could make caching optional, but I'm not sure what is the best interface point for this.