@@ -10,7 +10,7 @@ md"# MLJ for Data Scientists in Two Hours"
10
10
# ╔═╡ 8a6670b8-96a8-4a5d-b795-033f6f2a0674
11
11
md """
12
12
An application of the [MLJ
13
- toolbox](https://alan-turing-institute .github.io/MLJ.jl/dev/) to the
13
+ toolbox](https://juliaai .github.io/MLJ.jl/dev/) to the
14
14
Telco Customer Churn dataset, aimed at practicing data scientists
15
15
new to MLJ (Machine Learning in Julia). This tutorial does not
16
16
cover exploratory data analysis.
@@ -25,9 +25,9 @@ deep-learning).
25
25
# ╔═╡ b04c4790-59e0-42a3-af2a-25235e544a31
26
26
md """
27
27
For other MLJ learning resources see the [Learning
28
- MLJ](https://alan-turing-institute .github.io/MLJ.jl/dev/learning_mlj/)
28
+ MLJ](https://juliaai .github.io/MLJ.jl/dev/learning_mlj/)
29
29
section of the
30
- [manual](https://alan-turing-institute .github.io/MLJ.jl/dev/).
30
+ [manual](https://juliaai .github.io/MLJ.jl/dev/).
31
31
"""
32
32
33
33
# ╔═╡ 4eb8dff4-c23a-4b41-8af5-148d95ea2900
@@ -106,7 +106,7 @@ used to develop this tutorial. If this is your first time running
106
106
the notebook, package instantiation and pre-compilation may take a
107
107
minute or so to complete. **This step will fail** if the [correct
108
108
Manifest.toml and Project.toml
109
- files](https://github.com/alan-turing-institute /MLJ.jl/tree/dev/examples/telco)
109
+ files](https://github.com/JuliaAI /MLJ.jl/tree/dev/examples/telco)
110
110
are not in the same directory as this notebook.
111
111
"""
112
112
@@ -131,7 +131,7 @@ don't fully grasp should become clearer in the Telco study.
131
131
# ╔═╡ 33ca287e-8cba-47d1-a0de-1721c1bc2df2
132
132
md """
133
133
This section is a condensed adaption of the [Getting Started
134
- example](https://alan-turing-institute .github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
134
+ example](https://juliaai .github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
135
135
in the MLJ documentation.
136
136
"""
137
137
197
197
# ╔═╡ 0f978839-cc95-4c3a-8a29-32f11452654a
198
198
md """
199
199
A machine stores some other information enabling [warm
200
- restart](https://alan-turing-institute .github.io/MLJ.jl/dev/machines/#Warm-restarts)
200
+ restart](https://juliaai .github.io/MLJ.jl/dev/machines/#Warm-restarts)
201
201
for some models, but we won't go into that here. You are allowed to
202
202
access and mutate the `model` parameter:
203
203
"""
@@ -324,7 +324,7 @@ begin
324
324
return x
325
325
end
326
326
end
327
-
327
+
328
328
df0. TotalCharges = fix_blanks (df0. TotalCharges);
329
329
end
330
330
@@ -424,7 +424,7 @@ md"> Introduces: `@load`, `input_scitype`, `target_scitype`"
424
424
# ╔═╡ f97969e2-c15c-42cf-a6fa-eaf14df5d44b
425
425
md """
426
426
For tools helping us to identify suitable models, see the [Model
427
- Search](https://alan-turing-institute .github.io/MLJ.jl/dev/model_search/#model_search)
427
+ Search](https://juliaai .github.io/MLJ.jl/dev/model_search/#model_search)
428
428
section of the manual. We will build a gradient tree-boosting model,
429
429
a popular first choice for structured data like we have here. Model
430
430
code is contained in a third-party package called
@@ -497,7 +497,7 @@ pipe = ContinuousEncoder() |> booster
497
497
md """
498
498
Note that the component models appear as hyperparameters of
499
499
`pipe`. Pipelines are an implementation of a more general [model
500
- composition](https://alan-turing-institute .github.io/MLJ.jl/dev/composing_models/#Composing-Models)
500
+ composition](https://juliaai .github.io/MLJ.jl/dev/composing_models/#Composing-Models)
501
501
interface provided by MLJ that advanced users may want to learn about.
502
502
"""
503
503
@@ -693,7 +693,7 @@ observation space, for a total of 18 folds) and set
693
693
# ╔═╡ 562887bb-b7fb-430f-b61c-748aec38e674
694
694
md """
695
695
We choose a `StratifiedCV` resampling strategy; the complete list of options is
696
- [here](https://alan-turing-institute .github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).
696
+ [here](https://juliaai .github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).
697
697
"""
698
698
699
699
# ╔═╡ f9be989e-2604-44c2-9727-ed822e4fd85d
@@ -734,7 +734,7 @@ begin
734
734
table = (measure= measure, measurement= measurement)
735
735
return DataFrames. DataFrame (table)
736
736
end
737
-
737
+
738
738
const confidence_intervals_basic_model = confidence_intervals (e_pipe)
739
739
end
740
740
@@ -753,7 +753,7 @@ with low feature importance, to speed up later optimization:
753
753
# ╔═╡ cdfe840d-4e87-467f-b582-dfcbeb05bcc5
754
754
begin
755
755
unimportant_features = filter (:importance => < (0.005 ), feature_importance_table). feature
756
-
756
+
757
757
pipe2 = ContinuousEncoder () |>
758
758
FeatureSelector (features= unimportant_features, ignore= true ) |> booster
759
759
end
@@ -790,7 +790,7 @@ eg, the neural network models provided by
790
790
# ╔═╡ 8fc99d35-d8cc-455f-806e-1bc580dc349d
791
791
md """
792
792
First, we select appropriate controls from [this
793
- list](https://alan-turing-institute .github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):
793
+ list](https://juliaai .github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):
794
794
"""
795
795
796
796
# ╔═╡ 29f33708-4a82-4acc-9703-288eae064e2a
@@ -857,7 +857,7 @@ here is the `learning_curve` function, which can be useful when
857
857
wanting to visualize the effect of changes to a *single*
858
858
hyperparameter (which could be an iteration parameter). See, for
859
859
example, [this section of the
860
- manual](https://alan-turing-institute .github.io/MLJ.jl/dev/learning_curves/)
860
+ manual](https://juliaai .github.io/MLJ.jl/dev/learning_curves/)
861
861
or [this
862
862
tutorial](https://github.com/ablaom/MLJTutorial.jl/blob/dev/notebooks/04_tuning/notebook.ipynb).
863
863
"""
@@ -898,7 +898,7 @@ show(iterated_pipe, 2)
898
898
begin
899
899
p1 = :(model. evo_tree_classifier. η)
900
900
p2 = :(model. evo_tree_classifier. max_depth)
901
-
901
+
902
902
r1 = range (iterated_pipe, p1, lower= - 2 , upper= - 0.5 , scale= x-> 10 ^ x)
903
903
r2 = range (iterated_pipe, p2, lower= 2 , upper= 6 )
904
904
end
@@ -912,7 +912,7 @@ and `upper`.
912
912
# ╔═╡ af3023e6-920f-478d-af76-60dddeecbe6c
913
913
md """
914
914
Next, we choose an optimization strategy from [this
915
- list](https://alan-turing-institute .github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):
915
+ list](https://juliaai .github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):
916
916
"""
917
917
918
918
# ╔═╡ 93c17a9b-b49c-4780-9074-c069a0e97d7e
@@ -1105,9 +1105,9 @@ md"For comparison, here's the performance for the basic pipeline model"
1105
1105
begin
1106
1106
mach_basic = machine (pipe, X, y)
1107
1107
fit! (mach_basic, verbosity= 0 )
1108
-
1108
+
1109
1109
ŷ_basic = predict (mach_basic, Xtest);
1110
-
1110
+
1111
1111
@info (" Basic model measurements on test set:" ,
1112
1112
brier_loss (ŷ_basic, ytest) |> mean,
1113
1113
auc (ŷ_basic, ytest),
0 commit comments