Prevent duplicates in the result bdf of `simplify_index` #1385

hanachaari · 2025-03-22T06:55:52Z

Description

The function simplify_index does not prevent duplicates .
Changes:

Added two optional parameters to simplify_index to extend its functionality without breaking existing code that calls the function:
keep_duplicate_column:Specifies the column to filter by.
keep_duplicate_value: The value in the specified column to keep.
Ensured duplicates are either removed (if possible) or an exception is raised.

How to Test

Steps:
0. Make sure you have a pandas version >= 2.2.1

Use the fix/keep-beliefs-from-Simulation-source-in-evse-power-sensor branch, as it includes adaptations for this change.
Select a project, complete the setup phase, and run the smart charging scenario.
Ensure that no warning related to fill_null_values appears, unlike the previous issue:

Related Items

This PR closes #750 and enables the merging of future PR in smart-buildings.

…simplify_index

…func

Flix6x

This PR explicitly raises an error in case the result contains duplicate indices, which is good, because that should always be an unexpected result.

You also added functionality for whoever is calling this function to clean up duplicates, which is nice, but I feel it's too much responsibility for this function. I suggest to simply raise instead:

if bdf.index.duplicated().any():
    logging.debug(f"bdf with duplicates: {bdf}")
    raise ValueError("Duplicates found in index after processing.")

and have the code that calls this function deal with filtering by e.g. a specific source.

Preferably, though, the error message in this PR should contain information about the reason for the duplicates, which will help with debugging what went wrong. It could happen because of three things: multiple cumulative probability values per event, multiple belief times/horizons per event, or multiple sources per event. When we still have the BeliefsDataFrame, we could use check for these cases, using, respectively:

if bdf.lineage.number_of_events == len(bdf): # we won't end up with duplicate indices -> reindex
elif bdf.lineage.number_of_beliefs < len(bdf): # points to probabilistic beliefs -> raise informatively
elif bdf.lineage.number_of_events < bdf.lineage.number_of_beliefs and bdf.lineage.number_of_sources == 1: # points to multiple belief times/horizons per event -> raise informatively
elif bdf.lineage.number_of_events < bdf.lineage.number_of_beliefs and bdf.lineage.number_of_sources > 1: # points (most likely) to multiple sources per event (but theoretically could still be a case of multiple belief times/horizons per event, in combination with a switch from one source to another) -> raise informatively

Flix6x · 2025-04-12T15:32:40Z

flexmeasures/data/queries/utils.py

+    bdf: tb.BeliefsDataFrame,
+    index_levels_to_columns: list[str] | None = None,
+    keep_duplicate_value: str | None = None,
+    keep_duplicate_column: GenericAsset | GenericAssetType | None = None,


Shouldn't this be a str | None?

I removed it as you suggested here

Flix6x · 2025-04-12T15:39:03Z

flexmeasures/data/queries/utils.py

-    bdf: tb.BeliefsDataFrame, index_levels_to_columns: list[str] | None = None
+    bdf: tb.BeliefsDataFrame,
+    index_levels_to_columns: list[str] | None = None,
+    keep_duplicate_value: str | None = None,


I think you are using this new functionality solely to filter by source. It's presented here as if it could be useful for any available column (i.e. "event_value" and the ones passed in index_levels_to_columns), but I don't think that is the case.

In any case, none of the values in any of the columns is a str type, so Any would be more fitting.

…ns. To handle duplicate where simplify_index method is used

hanachaari · 2025-05-14T13:37:57Z

flexmeasures/data/queries/utils.py

@@ -241,6 +251,10 @@ def simplify_index(
                else:
                    raise KeyError(f"Level {col} not found")
    bdf.index = bdf.index.get_level_values("event_start")
+    if bdf.index.duplicated().any():
+       logging.debug(f"bdf with duplicates: {bdf}")
+    #  raise ValueError("Duplicates found in index after processing.")


Didn't raise a ValueError here because duplicate removal is handled by the caller. Logging a message should be sufficient right?.

hanachaari added 2 commits March 22, 2025 07:50

cherry-pick commit 1bbc751 ee2f1a0 from fix/prevent-duplicates-after-…

3dca4fa

…simplify_index

Merge branch 'main' into fix/prevent-duplicates-after-simplify_index_…

7f39982

…func

hanachaari changed the title ~~cherry-pick commit 1bbc7513 ee2f1a0a from fix/prevent-duplicates-afte…~~ Prevent duplicates in the result bdf of simplify_index Mar 22, 2025

hanachaari requested a review from Flix6x March 22, 2025 07:05

Flix6x requested changes Apr 12, 2025

View reviewed changes

imprve debugging: raise info on duplicate indices with possible reaso…

1883473

…ns. To handle duplicate where simplify_index method is used

hanachaari commented May 14, 2025

View reviewed changes

hanachaari requested a review from Flix6x May 14, 2025 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent duplicates in the result bdf of `simplify_index` #1385

Prevent duplicates in the result bdf of `simplify_index` #1385

Uh oh!

hanachaari commented Mar 22, 2025 •

edited

Loading

Uh oh!

Flix6x left a comment

Uh oh!

Flix6x Apr 12, 2025

Uh oh!

hanachaari May 14, 2025

Uh oh!

Flix6x Apr 12, 2025

Uh oh!

hanachaari May 14, 2025

Uh oh!

Uh oh!

Prevent duplicates in the result bdf of simplify_index #1385

Are you sure you want to change the base?

Prevent duplicates in the result bdf of simplify_index #1385

Uh oh!

Conversation

hanachaari commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How to Test

Related Items

Uh oh!

Flix6x left a comment

Choose a reason for hiding this comment

Uh oh!

Flix6x Apr 12, 2025

Choose a reason for hiding this comment

Uh oh!

hanachaari May 14, 2025

Choose a reason for hiding this comment

Uh oh!

Flix6x Apr 12, 2025

Choose a reason for hiding this comment

Uh oh!

hanachaari May 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Prevent duplicates in the result bdf of `simplify_index` #1385

Prevent duplicates in the result bdf of `simplify_index` #1385

hanachaari commented Mar 22, 2025 •

edited

Loading