Changing the anomaly score formula #568

Knispel2 · 2025-09-17T09:13:37Z

No description provided.

matwey · 2025-09-24T07:28:14Z

fink_science/ztf/ad_features/processor.py

    sub = pd.DataFrame({"magpsf": magpsf, "sigmapsf": sigmapsf, "jd": jd, "cfid": cfid})

    sub = sub.sort_values("jd", ascending=True)
+    sub = sub.drop_duplicates(subset=['jd'])


According to the documentation, this will drop all lines with the same jd keeping only first in each group.
At the same time we have cfid column and it is valid to have both filters at the same jd . I am not sure that it is technically possible, but still. Don't you think that subset=['jd', 'cfid'] would be better approach?

it is valid to have both filters at the same jd

For a given object, there must be only one filter for a given jd? And I do not understand why there could be duplicated jd for an object?

it is valid to have both filters at the same jd

For a given object, there must be only one filter for a given jd? And I do not understand why there could be duplicated jd for an object?

Me too :-)
But there are duplicated jd lines with the same filter in the wild. We found it because lc_features code breaks under this circumstances.

ah that's true some artifacts were found. I'm not sure this is majority, but ok I understand the logic.

matwey · 2025-09-24T07:40:45Z

fink_science/ztf/anomaly_detection/processor.py

I am little confused with how get_key works:

def get_key(x: dict, band: int): if ( len(x) != 2 or x is None or any( map( # noqa: W503, C417 lambda fs: (fs is None or len(fs) == 0), x.values() ) ) ): return pd.Series({k: np.nan for k in MODEL_COLUMNS}, dtype=np.float64) elif band in x: return pd.Series(x[band]) else: raise IndexError("band {} not found in {}".format(band, x))

Imagine two cases. First x have only key g. Second x have g and i. Then we ask for r key in both scenarios.
Then, you will have table of nans in first case, while exception in the second. It seems to be inconsistent behavior. Isn't it?

Yes, I agree, it does look strange. This is not my code, so I can't say exactly why it was done this way, but it seems that in the current implementation it is no longer necessary. Fixed it

matwey · 2025-09-24T08:18:55Z

fink_science/ztf/anomaly_detection/processor.py

+
+        # Case 4 (both are invalid) is already handled by the zero initialization.
+
+        return final_scores


It seems that employing numpy masked array will help to reduce code size here.

scores_g = ma.array(np.transpose(scores_g_raw[-1])[0], mask=mask_g.to_numpy()) scores_r = ma.array(np.transpose(scores_r_raw[-1])[0], mask=mask_r.to_numpy()) final_scores = ma.column_stack([scores_g, scores_r]).min(axis=1).filled(0)

matwey · 2025-09-24T08:39:15Z

fink_science/ztf/anomaly_detection/processor.py

+
+        # Initialize the final score array with zeros.
+        # This handles the case where data in both filters is NaN by default.
+        final_scores = np.zeros_like(scores_g, dtype=np.float64)


I am not sure that zero is a good default anomaly score. We should be able to somehow distinguish between true zero score and unknown score. Note, that it is not guaranteed that anomalies will even have negative scores. The anomalies only have scores just lower than nominals.

We with @pruzhinskaya decided to return nan instead to mark unknown score.

@JulienPeloton currently, we know that we are unable to calculate the score for some objects. From the broker point of view, does it make any sense to filter such objects before they enter to this module? Are you happy with nan as indicator of unknown anomaly score? Or is there another option?

Yes, it makes sense to filter such object before they enter the module. I'm ok with returning a default value that indicate that the module was unable to calculate the score. nan is a good indicator.

JulienPeloton · 2025-10-17T12:51:41Z

Thanks @Knispel2 for the code, and @matwey for the review!

I have one question to confirm. In the current release, models are about 50MB in size:

ls -lth fink_science/data/models/anomaly_detection/
total 258M
-rw-rw-r-- 1 peloton  51M Jul 22 10:58 anomaly_detection_forest_AAD_julien.zip
-rw-rw-r-- 1 peloton  51M Jul 22 10:58 anomaly_detection_forest_AAD_emille_30days.zip
-rw-rw-r-- 1 peloton  52M Jul 22 10:58 anomaly_detection_forest_AAD_emille.zip
-rw-rw-r-- 1 peloton  51M Jul 22 10:58 anomaly_detection_forest_AAD_anais.zip
-rw-rw-r-- 1 peloton  52M Jul 22 10:58 anomaly_detection_forest_AAD.zip
-rw-rw-r-- 1 peloton 2.4M Jul  4 07:47 anomaly_detection_forest_AAD_beta.zip
-rw-rw-r-- 1 peloton  660 Feb  9  2025 g_means.csv
-rw-rw-r-- 1 peloton  665 Feb  9  2025 r_means.csv
-rw-rw-r-- 1 peloton 103K Dec 20  2024 anomaly_detection_forest_AAD_maria.zip

while in this PR, models size are shrink to a couple of MB:

ls -lth fink_science/data/models/anomaly_detection/
total 11M
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD_julien.zip
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD_emille_30days.zip
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD_emille.zip
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD_anais.zip
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD.zip
-rw-rw-r-- 1 peloton 2.4M Jul  4 07:47 anomaly_detection_forest_AAD_beta.zip
-rw-rw-r-- 1 peloton  660 Feb  9  2025 g_means.csv
-rw-rw-r-- 1 peloton  665 Feb  9  2025 r_means.csv
-rw-rw-r-- 1 peloton 103K Dec 20  2024 anomaly_detection_forest_AAD_maria.zip

Expected?

Knispel2 · 2025-10-17T13:20:26Z

Thanks @Knispel2 for the code, and @matwey for the review!

I have one question to confirm. In the current release, models are about 50MB in size:

ls -lth fink_science/data/models/anomaly_detection/
total 258M
-rw-rw-r-- 1 peloton  51M Jul 22 10:58 anomaly_detection_forest_AAD_julien.zip
-rw-rw-r-- 1 peloton  51M Jul 22 10:58 anomaly_detection_forest_AAD_emille_30days.zip
-rw-rw-r-- 1 peloton  52M Jul 22 10:58 anomaly_detection_forest_AAD_emille.zip
-rw-rw-r-- 1 peloton  51M Jul 22 10:58 anomaly_detection_forest_AAD_anais.zip
-rw-rw-r-- 1 peloton  52M Jul 22 10:58 anomaly_detection_forest_AAD.zip
-rw-rw-r-- 1 peloton 2.4M Jul  4 07:47 anomaly_detection_forest_AAD_beta.zip
-rw-rw-r-- 1 peloton  660 Feb  9  2025 g_means.csv
-rw-rw-r-- 1 peloton  665 Feb  9  2025 r_means.csv
-rw-rw-r-- 1 peloton 103K Dec 20  2024 anomaly_detection_forest_AAD_maria.zip

while in this PR, models size are shrink to a couple of MB:

ls -lth fink_science/data/models/anomaly_detection/
total 11M
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD_julien.zip
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD_emille_30days.zip
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD_emille.zip
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD_anais.zip
-rw-rw-r-- 1 peloton 1.7M Oct 17 14:49 anomaly_detection_forest_AAD.zip
-rw-rw-r-- 1 peloton 2.4M Jul  4 07:47 anomaly_detection_forest_AAD_beta.zip
-rw-rw-r-- 1 peloton  660 Feb  9  2025 g_means.csv
-rw-rw-r-- 1 peloton  665 Feb  9  2025 r_means.csv
-rw-rw-r-- 1 peloton 103K Dec 20  2024 anomaly_detection_forest_AAD_maria.zip

Expected?

In the previous version, we greatly increased the depth of the trees in the hope that this would have a positive effect. However, we didn't see anything good, so we returned to the standard depth

JulienPeloton · 2025-10-17T13:40:52Z

Perfect @Knispel2 -- this makes sense!

Knispel2 added 2 commits September 17, 2025 12:06

change anomaly score processing

0d98a45

update first thr

5ecb8de

matwey reviewed Sep 24, 2025

View reviewed changes

Knispel2 added 4 commits September 29, 2025 00:26

fix

80050fb

fix ruff

c9e2186

preview

6a8e34a

update test

7a57dbf

matwey approved these changes Oct 16, 2025

View reviewed changes

JulienPeloton merged commit 75b36e0 into astrolabsoftware:master Oct 17, 2025
4 checks passed


		# Case 4 (both are invalid) is already handled by the zero initialization.

		return final_scores

Changing the anomaly score formula #568

Changing the anomaly score formula #568

Uh oh!

Conversation

Knispel2 commented Sep 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matwey Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JulienPeloton commented Oct 17, 2025

Uh oh!

Knispel2 commented Oct 17, 2025

Uh oh!

JulienPeloton commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matwey Sep 24, 2025 •

edited

Loading