-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Key-type customization in CanonicalValueStore
and ClangDecl
cleanups
#5743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The `ClangDecl` struct caused some confusion here -- it is embedding extra data into a `CanonicalValueStore` that isn't used for lookups or canonicalization, but is useful to store along side. This changes the `CanonicalValueStore` to support customized key type for `Lookup` so that we can provide the more direct API that only takes the relevant key. This in turn takes advantage of the support for heterogenous keys in the underlying `Set` as long as hashing and equality are consistent. We do need to add support for heterogenous equality comparison with `clang::Decl*`, but that is fairly easily done now that the argument-reversed form isn't needed as well. Lastly, this cleans up the `ClangDecl` customization points to be more idiomatic by using `operator==` and `CarbonHashValue`. While there, I've added comments to make it unambiguous why we can use the pointer value for the underlying `clang::Decl` due to the Clang AST's address-as-identity model. Resolves the immediate TODOs around this type. Future work might involve changing from the current `Add` API to one more like `Map` and `Set`'s API where a callback is used to create the object, but that level of API complexity isn't necessarily motivated yet and can easily be a follow-on if and when its worth doing. The `Add` code paths *are* working with the `inst_id` in order to create an instruction if we are importing the Clang declaration. It is the `Lookup` code paths that never needed to know about the `inst_id` and became more confusing for having to stub it out in the API.
toolchain/sem_ir/clang_decl.h
Outdated
// Equality comparison uses the address-identity property of the Clang AST and | ||
// just compares the `decl` pointers. We canonicalize on these to avoid ever | ||
// having different instructions for the same declaration. | ||
auto operator==(const ClangDecl& rhs) const -> bool { | ||
return decl == rhs.decl; | ||
} | ||
// Support direct comparison with the Clang AST node pointer. | ||
auto operator==(const clang::Decl* rhs_decl) const -> bool { | ||
return decl == rhs_decl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the concern raised by @geoffromer correctly, we should make ClangDecl
default comparison include all the fields. I think we should be able to customize the specific store to compare only the decl
field for the storage needs instead of changing ClangDecl
itself.
https://discord.com/channels/655572317891461132/768530752592805919/1384999468293947537
I think that if we can provide the information of not just what is the key of the storage but also how to extract the key from the value.
This is what I've tried to do in a clunky way in #5725, but I guess I missed changing the API appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to leave comments to explain why I don't think this is needed, also discussed some in the Discord...
The instruction ID isn't part of the identity of a declaration, it is just associated data. It's value is completely determined by the declaration pointer. Comparing them both should be redundant.
Is there an added comment that would make this more clear than the current ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instruction ID isn't part of the identity of a declaration, it is just associated data. It's value is completely determined by the declaration pointer. Comparing them both should be redundant.
But this isn't an invariant of the type itself, it's an invariant of how the type is used. Relying so deeply on that invariant makes ClangDecl
hard to understand in isolation -- it has to be understood in terms of its use as the value type of clang_decls
(and the fact that this is the only place ClangDecl
is used).
Ideally I'd prefer to decouple this type from that invariant, e.g. by teaching ValueStore
how to extract decl
and use it (rather than the whole ClangDecl
) as a lookup and canonicalization key. However, that's probably not worth the time investment right now, and might not be worth the additional complexity in ValueStore
at all.
Failing that, I think we should more explicitly document the coupling with clang_decls
, and I've added a suggestion to that end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this isn't an invariant of the type itself, it's an invariant of how the type is used. Relying so deeply on that invariant makes
ClangDecl
hard to understand in isolation -- it has to be understood in terms of its use as the value type ofclang_decls
(and the fact that this is the only placeClangDecl
is used).
FWIW, I think that is true regardless of the inclusion of inst_id
-- I don't think this struct is reasonable to understand in isolation, I think it is fundamentally an implementation detail of the value store and how we're putting clang::Decl*
s into a value store.
[...] by teaching
ValueStore
how to extractdecl
and use it (rather than the wholeClangDecl
) as a lookup and canonicalization key.
The CarbonHashValue
and operator==
definitions, IMO, are how you teach ValueStore
what to use for canonicalization.
And what this PR does is extend that to also support teaching ValueStore
about this for lookup. I feel like after this PR, we have done what you're suggesting here.
But all of that is in the context of this being coupled to the value store, not some independently useful type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I accepted your comment, but to your point this is really about the core of the ClangDecl
class, so lifted it up to the class comment and reworded it a bit to try to make this more clear and explicit. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm merging to unblock other work, but if you'd like any further changes to the comments here, happy to make them in follow-up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment changes look good to me. My only hesitation is that as I understand it, the hash/equality customization will break if ClangDecl
s associated with different CanonicalValueStore
s are intermingled, but the comment doesn't call that out. However, in context that situation may be too unlikely to be worth commenting on.
struct ClangDeclId : public IdBase<ClangDeclId> { | ||
static constexpr llvm::StringLiteral Label = "clang_decl_id"; | ||
|
||
using ValueType = ClangDecl; | ||
|
||
// Use the AST node pointer directly when doing `Lookup` to find an ID. | ||
using KeyType = clang::Decl*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @jonmeow raised a concern that we should change the ID as it could be used in multiple value stores, but perhaps just defining the key type is less of an issue.
https://discord.com/channels/655572317891461132/768530752592805919/1387492664458612938
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to hear Jon's thoughts here -- the ID type already dictates the value type, and even the reference type, so it seemed reasonable to also dictate the key type... Not sure what the looser coupling would look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bricknerb You were putting value-specific functions here (inside the ID), I think this approach is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note in part, I expect KeyType
here is something we can deal with by template parameters overriding it. When the hash function of the value is specifically associated with the ID, that starts making it harder to replace -- I tend to think of switching types through template parameters as easier that switching functions discovered through type parameters; adding/swapping a type parameter is easy, but discovering a function through one type when the function is actually for a different type is a bit more of a design issue.
toolchain/sem_ir/clang_decl.h
Outdated
// Equality comparison uses the address-identity property of the Clang AST and | ||
// just compares the `decl` pointers. We canonicalize on these to avoid ever | ||
// having different instructions for the same declaration. | ||
auto operator==(const ClangDecl& rhs) const -> bool { | ||
return decl == rhs.decl; | ||
} | ||
// Support direct comparison with the Clang AST node pointer. | ||
auto operator==(const clang::Decl* rhs_decl) const -> bool { | ||
return decl == rhs_decl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instruction ID isn't part of the identity of a declaration, it is just associated data. It's value is completely determined by the declaration pointer. Comparing them both should be redundant.
But this isn't an invariant of the type itself, it's an invariant of how the type is used. Relying so deeply on that invariant makes ClangDecl
hard to understand in isolation -- it has to be understood in terms of its use as the value type of clang_decls
(and the fact that this is the only place ClangDecl
is used).
Ideally I'd prefer to decouple this type from that invariant, e.g. by teaching ValueStore
how to extract decl
and use it (rather than the whole ClangDecl
) as a lookup and canonicalization key. However, that's probably not worth the time investment right now, and might not be worth the additional complexity in ValueStore
at all.
Failing that, I think we should more explicitly document the coupling with clang_decls
, and I've added a suggestion to that end.
Co-authored-by: Geoff Romer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
Per #toolchain discussion, I think we'll want to change how ValueStoreTypes works, but if anything that just makes it easier for me to not mind the odd change here. :)
THanks all, merging to unblock other stuff but happy to do any follow-ups particularly on the comments. |
The
ClangDecl
struct caused some confusion here -- it is embedding extra data into aCanonicalValueStore
that isn't used for lookups or canonicalization, but is useful to store along side. This changes theCanonicalValueStore
to support customized key type forLookup
so that we can provide the more direct API that only takes the relevant key.This in turn takes advantage of the support for heterogenous keys in the underlying
Set
as long as hashing and equality are consistent. We do need to add support for heterogenous equality comparison withclang::Decl*
, but that is fairly easily done now that the argument-reversed form isn't needed as well.Lastly, this cleans up the
ClangDecl
customization points to be more idiomatic by usingoperator==
andCarbonHashValue
. While there, I've added comments to make it unambiguous why we can use the pointer value for the underlyingclang::Decl
due to the Clang AST's address-as-identity model.Resolves the immediate TODOs around this type.
Future work might involve changing from the current
Add
API to one more likeMap
andSet
's API where a callback is used to create the object, but that level of API complexity isn't necessarily motivated yet and can easily be a follow-on if and when its worth doing. TheAdd
code paths are working with theinst_id
in order to create an instruction if we are importing the Clang declaration. It is theLookup
code paths that never needed to know about theinst_id
and became more confusing for having to stub it out in the API.