-
Notifications
You must be signed in to change notification settings - Fork 662
refactor(common): make FrozenDict
a subclass of dict
#8693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b4435fd
to
dad9170
Compare
|
||
__slots__ = ("__view__", "__precomputed_hash__") | ||
__view__: MappingProxyType | ||
class FrozenDict(dict, Mapping[K, V], Hashable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, can we avoid the subclass?
If someone wrote
def f(mapping: dict): ...
Then it would be acceptable to pass an instance of FrozenDict
, which cannot be mutated, despite being a dict
. That's extremely surprising to me, and I suspect it would be to others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this to fix the related issues no. Both are caused by explicit isinstance(x, dict)
checks when they probably want Mapping
instead.
We could hide the dict
base class from mypy
so mypy at least would error on that case, but for runtime checking this would still be a dict subclass. Not sure if that's worth it.
FWIW I think frozendict
is unlikely to creep into user-facing code. We might still want to coerce them to actual dict
instances on .execute()
to avoid this (as nick attempted to do in his PR). But for internal-use only this fix seems like a definite win, both in performance and in correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are caused by explicit isinstance(x, dict)
You are saying you found somewhere in pandas code that is incorrect? If you point this to me I can file a PR upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's the complete or right fix, but https://github.com/pandas-dev/pandas/blob/aa3e949e2a2b72588186cb1936edb535713aefa0/pandas/io/formats/printing.py#L223 is one part of the repr issue. The execution issue is (likely) in our code, but is also fixed by the cleanup to make frozendicts actual dicts so I didn't bother delving further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both in performance and in correctness
Could you add a quick benchmark for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @cpcloud's argument, but I am in favour of this change if benchmarks show a clear performance improvement (I expect it to be better without the custom methods). The correctness is related to hashing which can also be fixed while keeping the dictionary proxy view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quickly ran the graph traversal benchmarks which indeed improved:
---------------------------------------------------------------------------- benchmark 'test_bfs': 2 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bfs (0680_5041894) 2.2367 (1.11) 2.9631 (1.14) 2.3231 (1.08) 0.1120 (1.06) 2.2752 (1.06) 0.0923 (1.0) 25;14 430.4501 (0.93) 171 1
test_bfs (NOW) 2.0154 (1.0) 2.6029 (1.0) 2.1532 (1.0) 0.1060 (1.0) 2.1370 (1.0) 0.1417 (1.54) 79;6 464.4347 (1.0) 239 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_dfs': 2 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_dfs (0680_5041894) 2.2076 (1.11) 2.6657 (1.18) 2.2614 (1.09) 0.0681 (1.0) 2.2378 (1.10) 0.0683 (1.0) 22;13 442.2093 (0.92) 189 1
test_dfs (NOW) 1.9952 (1.0) 2.2626 (1.0) 2.0778 (1.0) 0.0777 (1.14) 2.0388 (1.0) 0.1393 (2.04) 13;0 481.2894 (1.0) 46 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_replace_mapping': 2 tests ------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_replace_mapping (0680_5041894) 9.3224 (1.12) 28.3315 (1.04) 10.2140 (1.14) 3.1599 (1.62) 9.6292 (1.12) 0.2436 (1.0) 1;3 97.9049 (0.87) 35 1
test_replace_mapping (NOW) 8.3443 (1.0) 27.1707 (1.0) 8.9253 (1.0) 1.9461 (1.0) 8.5792 (1.0) 0.4105 (1.69) 1;1 112.0406 (1.0) 92 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------ benchmark 'test_replace_pattern': 2 tests ------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_replace_pattern (0680_5041894) 12.4369 (1.08) 32.3992 (1.07) 13.5555 (1.11) 3.2492 (1.52) 12.7563 (1.08) 0.7327 (1.60) 2;2 73.7707 (0.90) 62 1
test_replace_pattern (NOW) 11.5567 (1.0) 30.1479 (1.0) 12.1997 (1.0) 2.1385 (1.0) 11.8208 (1.0) 0.4588 (1.0) 1;2 81.9690 (1.0) 74 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This was discovered in ibis-project/ibis#8693
This was discovered in ibis-project/ibis#8693
This was discovered in ibis-project/ibis#8693
This was discovered in ibis-project/ibis#8693
FrozenDict
a subclass of dict
Co-authored-by: Krisztián Szűcs <[email protected]>
This was discovered in ibis-project/ibis#8693
This was motivated to work around pandas not repr'ing
frozendict
elements properly (seen in #8687), but while poking at that I found:MappingProxyType
which was boxed in aFrozenDict
- we can do better by just using storing the data in theFrozenDict
itself).hash(frozendict(a=1, b=2)) != hash(frozendict(b=2, a=1))
. This has since been fixed.Fixes #8687.