Update docs for index selection API

MilesCranmer · MilesCranmer · commit 22b5b7b4c570 · 2024-02-17T19:11:53.000Z
diff --git a/README.md b/README.md
@@ -153,16 +153,20 @@ predict(mach, X)
 ```
 
 This will make predictions using the expression
-selected using the function passed to `selection_method`.
-By default this selection is made a mix of accuracy and complexity.
-For example, we can make predictions using expression 2 with:
+selected by `model.selection_method`,
+which by default is a mix of accuracy and complexity.
+
+You can override this selection and select an equation from
+the Pareto front manually with:
 
 ```julia
-mach.model.selection_method = Returns(2)
-predict(mach, X)
+predict(mach, (data=X, idx=2))
 ```
 
-For fitting multiple outputs, one can use `MultitargetSRRegressor`.
+where here we choose to evaluate the second equation.
+
+For fitting multiple outputs, one can use `MultitargetSRRegressor`
+(and pass an array of indices to `idx` in `predict` for selecting specific equations).
 For a full list of options available to each regressor, see the [API page](https://astroautomata.com/SymbolicRegression.jl/dev/api/).
 
 ### Low-Level Interface
diff --git a/src/MLJInterface.jl b/src/MLJInterface.jl
@@ -500,11 +500,14 @@ function tag_with_docstring(model_name::Symbol, description::String, bottom_matt
         Note that if you pass complex data `::Complex{L}`, then the loss
         type will automatically be set to `L`.
     - `selection_method::Function`: Function to selection expression from
-        the Pareto frontier for use in `predict`. See `SymbolicRegression.MLJInterfaceModule.choose_best`
-        for an example. This function should return a single integer specifying
-        the index of the expression to use. By default, `choose_best` maximizes
+        the Pareto frontier for use in `predict`.
+        See `SymbolicRegression.MLJInterfaceModule.choose_best` for an example.
+        This function should return a single integer specifying
+        the index of the expression to use. By default, this maximizes
         the score (a pound-for-pound rating) of expressions reaching the threshold
-        of 1.5x the minimum loss. To fix the index at `5`, you could just write `Returns(5)`.
+        of 1.5x the minimum loss. To override this at prediction time, you can pass
+        a named tuple with keys `data` and `idx` to `predict`. See the Operations
+        section for details.
     - `dimensions_type::AbstractDimensions`: The type of dimensions to use when storing
         the units of the data. By default this is `DynamicQuantities.SymbolicDimensions`.
     """
@@ -515,7 +518,7 @@ function tag_with_docstring(model_name::Symbol, description::String, bottom_matt
     - `predict(mach, Xnew)`: Return predictions of the target given features `Xnew`, which
       should have same scitype as `X` above. The expression used for prediction is defined
       by the `selection_method` function, which can be seen by viewing `report(mach).best_idx`.
-    - `predict(mach, (; data=Xnew, idx=i))`: Return predictions of the target given features
+    - `predict(mach, (data=Xnew, idx=i))`: Return predictions of the target given features
       `Xnew`, which should have same scitype as `X` above. By passing a named tuple with keys
       `data` and `idx`, you are able to specify the equation you wish to evaluate in `idx`.
 
@@ -578,7 +581,8 @@ eval(
     Note that unlike other regressors, symbolic regression stores a list of
     trained models. The model chosen from this list is defined by the function
     `selection_method` keyword argument, which by default balances accuracy
-    and complexity.
+    and complexity. You can override this at prediction time by passing a named
+    tuple with keys `data` and `idx`.
 
     """,
             r"^    " => "",
@@ -590,7 +594,8 @@ eval(
     The fields of `fitted_params(mach)` are:
 
     - `best_idx::Int`: The index of the best expression in the Pareto frontier,
-       as determined by the `selection_method` function.
+       as determined by the `selection_method` function. Override in `predict` by passing
+        a named tuple with keys `data` and `idx`.
     - `equations::Vector{Node{T}}`: The expressions discovered by the search, represented
       in a dominating Pareto frontier (i.e., the best expressions found for
       each complexity). `T` is equal to the element type
@@ -701,7 +706,8 @@ eval(
     Note that unlike other regressors, symbolic regression stores a list of lists of
     trained models. The models chosen from each of these lists is defined by the function
     `selection_method` keyword argument, which by default balances accuracy
-    and complexity.
+    and complexity. You can override this at prediction time by passing a named
+    tuple with keys `data` and `idx`.
 
     """,
             r"^    " => "",
@@ -713,7 +719,8 @@ eval(
     The fields of `fitted_params(mach)` are:
 
     - `best_idx::Vector{Int}`: The index of the best expression in each Pareto frontier,
-      as determined by the `selection_method` function.
+      as determined by the `selection_method` function. Override in `predict` by passing
+      a named tuple with keys `data` and `idx`.
     - `equations::Vector{Vector{Node{T}}}`: The expressions discovered by the search, represented
       in a dominating Pareto frontier (i.e., the best expressions found for
       each complexity). The outer vector is indexed by target variable, and the inner
@@ -727,7 +734,8 @@ eval(
     The fields of `report(mach)` are:
 
     - `best_idx::Vector{Int}`: The index of the best expression in each Pareto frontier,
-       as determined by the `selection_method` function.
+       as determined by the `selection_method` function. Override in `predict` by passing
+       a named tuple with keys `data` and `idx`.
     - `equations::Vector{Vector{Node{T}}}`: The expressions discovered by the search, represented
       in a dominating Pareto frontier (i.e., the best expressions found for
       each complexity). The outer vector is indexed by target variable, and the inner