Skip to content

Commit 9438597

Browse files
committed
Merge tag 'v0.23.2' into graph-nodes
[Diff since v0.23.1](v0.23.1...v0.23.2) **Merged pull requests:** - Formatting overhaul (#278) (@MilesCranmer) - Avoid julia-formatter on pre-commit.ci (#279) (@MilesCranmer) - Make it easier to select expression from Pareto front for evaluation (#289) (@MilesCranmer) **Closed issues:** - Garbage collection too passive on worker processes (#237) - How can I set the maximum number of nests? (#285)
2 parents da938ce + e18c742 commit 9438597

File tree

4 files changed

+119
-71
lines changed

4 files changed

+119
-71
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "SymbolicRegression"
22
uuid = "8254be44-1295-4e6a-a16d-46603ac705cb"
33
authors = ["MilesCranmer <[email protected]>"]
4-
version = "0.23.1"
4+
version = "0.23.2"
55

66
[deps]
77
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"

README.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
1+
<!-- prettier-ignore-start -->
12
<div align="center">
23

34
SymbolicRegression.jl searches for symbolic expressions which optimize a particular objective.
45

56
https://github.com/MilesCranmer/SymbolicRegression.jl/assets/7593028/f5b68f1f-9830-497f-a197-6ae332c94ee0
67

7-
<!-- prettier-ignore-start -->
88
| Latest release | Documentation | Forums | Paper |
99
| :---: | :---: | :---: | :---: |
1010
| [![version](https://juliahub.com/docs/SymbolicRegression/version.svg)](https://juliahub.com/ui/Packages/SymbolicRegression/X2eIS) | [![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://astroautomata.com/SymbolicRegression.jl/dev/) | [![Discussions](https://img.shields.io/badge/discussions-github-informational)](https://github.com/MilesCranmer/PySR/discussions) | [![Paper](https://img.shields.io/badge/arXiv-2305.01582-b31b1b)](https://arxiv.org/abs/2305.01582) |
1111

1212
| Build status | Coverage |
1313
| :---: | :---: |
1414
| [![CI](https://github.com/MilesCranmer/SymbolicRegression.jl/workflows/CI/badge.svg)](.github/workflows/CI.yml) | [![Coverage Status](https://coveralls.io/repos/github/MilesCranmer/SymbolicRegression.jl/badge.svg?branch=master)](https://coveralls.io/github/MilesCranmer/SymbolicRegression.jl?branch=master)<br>[![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl) |
15-
<!-- prettier-ignore-end >
1615

1716
Check out [PySR](https://github.com/MilesCranmer/PySR) for
1817
a Python frontend.
1918
[Cite this software](https://arxiv.org/abs/2305.01582)
2019

2120
</div>
21+
<!-- prettier-ignore-end >
2222
2323
**Contents**:
2424
@@ -153,16 +153,20 @@ predict(mach, X)
153153
```
154154

155155
This will make predictions using the expression
156-
selected using the function passed to `selection_method`.
157-
By default this selection is made a mix of accuracy and complexity.
158-
For example, we can make predictions using expression 2 with:
156+
selected by `model.selection_method`,
157+
which by default is a mix of accuracy and complexity.
158+
159+
You can override this selection and select an equation from
160+
the Pareto front manually with:
159161

160162
```julia
161-
mach.model.selection_method = Returns(2)
162-
predict(mach, X)
163+
predict(mach, (data=X, idx=2))
163164
```
164165

165-
For fitting multiple outputs, one can use `MultitargetSRRegressor`.
166+
where here we choose to evaluate the second equation.
167+
168+
For fitting multiple outputs, one can use `MultitargetSRRegressor`
169+
(and pass an array of indices to `idx` in `predict` for selecting specific equations).
166170
For a full list of options available to each regressor, see the [API page](https://astroautomata.com/SymbolicRegression.jl/dev/api/).
167171

168172
### Low-Level Interface
@@ -223,9 +227,12 @@ The `output` array will contain the result of the tree at each of the 100 rows.
223227
This `did_succeed` flag detects whether an evaluation was successful, or whether
224228
encountered any NaNs or Infs during calculation (such as, e.g., `sqrt(-1)`).
225229

226-
## Constructing trees
230+
## Constructing expressions
231+
232+
Expressions are represented as the `Node` type which is developed
233+
in the [DynamicExpressions.jl](https://github.com/SymbolicML/DynamicExpressions.jl/) package.
227234

228-
You can also manipulate and construct trees directly. For example:
235+
You can manipulate and construct expressions directly. For example:
229236

230237
```julia
231238
import SymbolicRegression: Options, Node, eval_tree_array

src/MLJInterface.jl

Lines changed: 70 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -274,22 +274,12 @@ function prediction_warn()
274274
@warn "Evaluation failed either due to NaNs detected or due to unfinished search. Using 0s for prediction."
275275
end
276276

277-
@inline function wrap_units(v, y_units, i::Integer)
278-
if y_units === nothing
279-
return v
280-
else
281-
return (yi -> Quantity(yi, y_units[i])).(v)
282-
end
283-
end
284-
@inline function wrap_units(v, y_units, ::Nothing)
285-
if y_units === nothing
286-
return v
287-
else
288-
return (yi -> Quantity(yi, y_units)).(v)
289-
end
290-
end
277+
wrap_units(v, ::Nothing, ::Integer) = v
278+
wrap_units(v, ::Nothing, ::Nothing) = v
279+
wrap_units(v, y_units, i::Integer) = (yi -> Quantity(yi, y_units[i])).(v)
280+
wrap_units(v, y_units, ::Nothing) = (yi -> Quantity(yi, y_units)).(v)
291281

292-
function prediction_fallback(::Type{T}, m::SRRegressor, Xnew_t, fitresult) where {T}
282+
function prediction_fallback(::Type{T}, ::SRRegressor, Xnew_t, fitresult, _) where {T}
293283
prediction_warn()
294284
out = fill!(similar(Xnew_t, T, axes(Xnew_t, 2)), zero(T))
295285
return wrap_units(out, fitresult.y_units, nothing)
@@ -303,11 +293,11 @@ function prediction_fallback(
303293
fill!(similar(Xnew_t, T, axes(Xnew_t, 2)), zero(T)), fitresult.y_units, i
304294
) for i in 1:(fitresult.num_targets)
305295
]
306-
out_matrix = reduce(hcat, out_cols)
296+
out_matrix = hcat(out_cols...)
307297
if !fitresult.y_is_table
308298
return out_matrix
309299
else
310-
return MMI.table(out_matrix; names=fitresult.y_variable_names, prototype=prototype)
300+
return MMI.table(out_matrix; names=fitresult.y_variable_names, prototype)
311301
end
312302
end
313303

@@ -344,50 +334,58 @@ function MMI.fitted_params(m::AbstractSRRegressor, fitresult)
344334
)
345335
end
346336

347-
function MMI.predict(m::SRRegressor, fitresult, Xnew)
348-
params = full_report(m, fitresult; v_with_strings=Val(false))
349-
Xnew_t, variable_names, X_units = get_matrix_and_info(Xnew, m.dimensions_type)
350-
T = promote_type(eltype(Xnew_t), fitresult.types.T)
351-
if length(params.equations) == 0
352-
return prediction_fallback(T, m, Xnew_t, fitresult)
353-
end
354-
X_units_clean = clean_units(X_units)
355-
validate_variable_names(variable_names, fitresult)
356-
validate_units(X_units_clean, fitresult.X_units)
357-
eq = params.equations[params.best_idx]
358-
out, completed = eval_tree_array(eq, Xnew_t, fitresult.options)
359-
if !completed
360-
return prediction_fallback(T, m, Xnew_t, fitresult)
337+
function eval_tree_mlj(
338+
tree::Node, X_t, m::AbstractSRRegressor, ::Type{T}, fitresult, i, prototype
339+
) where {T}
340+
out, completed = eval_tree_array(tree, X_t, fitresult.options)
341+
if completed
342+
return wrap_units(out, fitresult.y_units, i)
361343
else
362-
return wrap_units(out, fitresult.y_units, nothing)
344+
return prediction_fallback(T, m, X_t, fitresult, prototype)
363345
end
364346
end
365-
function MMI.predict(m::MultitargetSRRegressor, fitresult, Xnew)
347+
348+
function MMI.predict(m::M, fitresult, Xnew; idx=nothing) where {M<:AbstractSRRegressor}
349+
if Xnew isa NamedTuple && (haskey(Xnew, :idx) || haskey(Xnew, :data))
350+
@assert(
351+
haskey(Xnew, :idx) && haskey(Xnew, :data) && length(keys(Xnew)) == 2,
352+
"If specifying an equation index during prediction, you must use a named tuple with keys `idx` and `data`."
353+
)
354+
return MMI.predict(m, fitresult, Xnew.data; idx=Xnew.idx)
355+
end
356+
366357
params = full_report(m, fitresult; v_with_strings=Val(false))
367358
prototype = MMI.istable(Xnew) ? Xnew : nothing
368359
Xnew_t, variable_names, X_units = get_matrix_and_info(Xnew, m.dimensions_type)
369360
T = promote_type(eltype(Xnew_t), fitresult.types.T)
361+
362+
if isempty(params.equations) || any(isempty, params.equations)
363+
@warn "Equations not found. Returning 0s for prediction."
364+
return prediction_fallback(T, m, Xnew_t, fitresult, prototype)
365+
end
366+
370367
X_units_clean = clean_units(X_units)
371368
validate_variable_names(variable_names, fitresult)
372369
validate_units(X_units_clean, fitresult.X_units)
373-
equations = params.equations
374-
if any(t -> length(t) == 0, equations)
375-
return prediction_fallback(T, m, Xnew_t, fitresult, prototype)
376-
end
377-
best_idx = params.best_idx
378-
outs = []
379-
for (i, (best_i, eq)) in enumerate(zip(best_idx, equations))
380-
out, completed = eval_tree_array(eq[best_i], Xnew_t, fitresult.options)
381-
if !completed
382-
return prediction_fallback(T, m, Xnew_t, fitresult, prototype)
370+
371+
idx = idx === nothing ? params.best_idx : idx
372+
373+
if M <: SRRegressor
374+
return eval_tree_mlj(
375+
params.equations[idx], Xnew_t, m, T, fitresult, nothing, prototype
376+
)
377+
elseif M <: MultitargetSRRegressor
378+
outs = [
379+
eval_tree_mlj(
380+
params.equations[i][idx[i]], Xnew_t, m, T, fitresult, i, prototype
381+
) for i in eachindex(idx, params.equations)
382+
]
383+
out_matrix = reduce(hcat, outs)
384+
if !fitresult.y_is_table
385+
return out_matrix
386+
else
387+
return MMI.table(out_matrix; names=fitresult.y_variable_names, prototype)
383388
end
384-
push!(outs, wrap_units(out, fitresult.y_units, i))
385-
end
386-
out_matrix = reduce(hcat, outs)
387-
if !fitresult.y_is_table
388-
return out_matrix
389-
else
390-
return MMI.table(out_matrix; names=fitresult.y_variable_names, prototype=prototype)
391389
end
392390
end
393391

@@ -508,11 +506,14 @@ function tag_with_docstring(model_name::Symbol, description::String, bottom_matt
508506
Note that if you pass complex data `::Complex{L}`, then the loss
509507
type will automatically be set to `L`.
510508
- `selection_method::Function`: Function to selection expression from
511-
the Pareto frontier for use in `predict`. See `SymbolicRegression.MLJInterfaceModule.choose_best`
512-
for an example. This function should return a single integer specifying
513-
the index of the expression to use. By default, `choose_best` maximizes
509+
the Pareto frontier for use in `predict`.
510+
See `SymbolicRegression.MLJInterfaceModule.choose_best` for an example.
511+
This function should return a single integer specifying
512+
the index of the expression to use. By default, this maximizes
514513
the score (a pound-for-pound rating) of expressions reaching the threshold
515-
of 1.5x the minimum loss. To fix the index at `5`, you could just write `Returns(5)`.
514+
of 1.5x the minimum loss. To override this at prediction time, you can pass
515+
a named tuple with keys `data` and `idx` to `predict`. See the Operations
516+
section for details.
516517
- `dimensions_type::AbstractDimensions`: The type of dimensions to use when storing
517518
the units of the data. By default this is `DynamicQuantities.SymbolicDimensions`.
518519
"""
@@ -523,6 +524,9 @@ function tag_with_docstring(model_name::Symbol, description::String, bottom_matt
523524
- `predict(mach, Xnew)`: Return predictions of the target given features `Xnew`, which
524525
should have same scitype as `X` above. The expression used for prediction is defined
525526
by the `selection_method` function, which can be seen by viewing `report(mach).best_idx`.
527+
- `predict(mach, (data=Xnew, idx=i))`: Return predictions of the target given features
528+
`Xnew`, which should have same scitype as `X` above. By passing a named tuple with keys
529+
`data` and `idx`, you are able to specify the equation you wish to evaluate in `idx`.
526530
527531
$(bottom_matter)
528532
"""
@@ -583,7 +587,8 @@ eval(
583587
Note that unlike other regressors, symbolic regression stores a list of
584588
trained models. The model chosen from this list is defined by the function
585589
`selection_method` keyword argument, which by default balances accuracy
586-
and complexity.
590+
and complexity. You can override this at prediction time by passing a named
591+
tuple with keys `data` and `idx`.
587592
588593
""",
589594
r"^ " => "",
@@ -595,7 +600,8 @@ eval(
595600
The fields of `fitted_params(mach)` are:
596601
597602
- `best_idx::Int`: The index of the best expression in the Pareto frontier,
598-
as determined by the `selection_method` function.
603+
as determined by the `selection_method` function. Override in `predict` by passing
604+
a named tuple with keys `data` and `idx`.
599605
- `equations::Vector{Node{T}}`: The expressions discovered by the search, represented
600606
in a dominating Pareto frontier (i.e., the best expressions found for
601607
each complexity). `T` is equal to the element type
@@ -608,7 +614,8 @@ eval(
608614
The fields of `report(mach)` are:
609615
610616
- `best_idx::Int`: The index of the best expression in the Pareto frontier,
611-
as determined by the `selection_method` function.
617+
as determined by the `selection_method` function. Override in `predict` by passing
618+
a named tuple with keys `data` and `idx`.
612619
- `equations::Vector{Node{T}}`: The expressions discovered by the search, represented
613620
in a dominating Pareto frontier (i.e., the best expressions found for
614621
each complexity).
@@ -705,7 +712,8 @@ eval(
705712
Note that unlike other regressors, symbolic regression stores a list of lists of
706713
trained models. The models chosen from each of these lists is defined by the function
707714
`selection_method` keyword argument, which by default balances accuracy
708-
and complexity.
715+
and complexity. You can override this at prediction time by passing a named
716+
tuple with keys `data` and `idx`.
709717
710718
""",
711719
r"^ " => "",
@@ -717,7 +725,8 @@ eval(
717725
The fields of `fitted_params(mach)` are:
718726
719727
- `best_idx::Vector{Int}`: The index of the best expression in each Pareto frontier,
720-
as determined by the `selection_method` function.
728+
as determined by the `selection_method` function. Override in `predict` by passing
729+
a named tuple with keys `data` and `idx`.
721730
- `equations::Vector{Vector{Node{T}}}`: The expressions discovered by the search, represented
722731
in a dominating Pareto frontier (i.e., the best expressions found for
723732
each complexity). The outer vector is indexed by target variable, and the inner
@@ -731,7 +740,8 @@ eval(
731740
The fields of `report(mach)` are:
732741
733742
- `best_idx::Vector{Int}`: The index of the best expression in each Pareto frontier,
734-
as determined by the `selection_method` function.
743+
as determined by the `selection_method` function. Override in `predict` by passing
744+
a named tuple with keys `data` and `idx`.
735745
- `equations::Vector{Vector{Node{T}}}`: The expressions discovered by the search, represented
736746
in a dominating Pareto frontier (i.e., the best expressions found for
737747
each complexity). The outer vector is indexed by target variable, and the inner

test/test_mlj.jl

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,17 @@ end
4343
fit!(mach)
4444
rep = report(mach)
4545
@test occursin("a", rep.equation_strings[rep.best_idx])
46+
ypred_good = predict(mach, X)
4647
@test sum(abs2, predict(mach, X) .- y) / length(y) < 1e-5
4748

49+
@testset "Check that we can choose the equation" begin
50+
ypred_same = predict(mach, (data=X, idx=rep.best_idx))
51+
@test ypred_good == ypred_same
52+
53+
ypred_bad = predict(mach, (data=X, idx=1))
54+
@test ypred_good != ypred_bad
55+
end
56+
4857
@testset "Smoke test SymbolicUtils" begin
4958
eqn = node_to_symbolic(rep.equations[rep.best_idx], model)
5059
n = symbolic_to_node(eqn, model)
@@ -63,6 +72,28 @@ end
6372
@test all(
6473
eq -> occursin("a", eq), [rep.equation_strings[i][rep.best_idx[i]] for i in 1:3]
6574
)
75+
ypred_good = predict(mach, X)
76+
77+
@testset "Test that we can choose the equation" begin
78+
ypred_same = predict(mach, (data=X, idx=rep.best_idx))
79+
@test ypred_good == ypred_same
80+
81+
ypred_bad = predict(mach, (data=X, idx=[1, 1, 1]))
82+
@test ypred_good != ypred_bad
83+
84+
ypred_mixed = predict(mach, (data=X, idx=[rep.best_idx[1], 1, rep.best_idx[3]]))
85+
@test ypred_mixed == hcat(ypred_good[:, 1], ypred_bad[:, 2], ypred_good[:, 3])
86+
87+
@test_throws AssertionError predict(mach, (data=X,))
88+
VERSION >= v"1.8" &&
89+
@test_throws "If specifying an equation index during" predict(
90+
mach, (data=X,)
91+
)
92+
VERSION >= v"1.8" &&
93+
@test_throws "If specifying an equation index during" predict(
94+
mach, (X=X, idx=1)
95+
)
96+
end
6697
end
6798

6899
@testset "Named outputs" begin

0 commit comments

Comments
 (0)