Skip to content

Key-type customization in CanonicalValueStore and ClangDecl cleanups #5743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 27, 2025

Conversation

chandlerc
Copy link
Contributor

The ClangDecl struct caused some confusion here -- it is embedding extra data into a CanonicalValueStore that isn't used for lookups or canonicalization, but is useful to store along side. This changes the CanonicalValueStore to support customized key type for Lookup so that we can provide the more direct API that only takes the relevant key.

This in turn takes advantage of the support for heterogenous keys in the underlying Set as long as hashing and equality are consistent. We do need to add support for heterogenous equality comparison with clang::Decl*, but that is fairly easily done now that the argument-reversed form isn't needed as well.

Lastly, this cleans up the ClangDecl customization points to be more idiomatic by using operator== and CarbonHashValue. While there, I've added comments to make it unambiguous why we can use the pointer value for the underlying clang::Decl due to the Clang AST's address-as-identity model.

Resolves the immediate TODOs around this type.

Future work might involve changing from the current Add API to one more like Map and Set's API where a callback is used to create the object, but that level of API complexity isn't necessarily motivated yet and can easily be a follow-on if and when its worth doing. The Add code paths are working with the inst_id in order to create an instruction if we are importing the Clang declaration. It is the Lookup code paths that never needed to know about the inst_id and became more confusing for having to stub it out in the API.

The `ClangDecl` struct caused some confusion here -- it is embedding
extra data into a `CanonicalValueStore` that isn't used for lookups or
canonicalization, but is useful to store along side. This changes the
`CanonicalValueStore` to support customized key type for `Lookup` so
that we can provide the more direct API that only takes the relevant
key.

This in turn takes advantage of the support for heterogenous keys in the
underlying `Set` as long as hashing and equality are consistent. We do
need to add support for heterogenous equality comparison with
`clang::Decl*`, but that is fairly easily done now that the
argument-reversed form isn't needed as well.

Lastly, this cleans up the `ClangDecl` customization points to be more
idiomatic by using `operator==` and `CarbonHashValue`. While there, I've
added comments to make it unambiguous why we can use the pointer value
for the underlying `clang::Decl` due to the Clang AST's
address-as-identity model.

Resolves the immediate TODOs around this type.

Future work might involve changing from the current `Add` API to one
more like `Map` and `Set`'s API where a callback is used to create the
object, but that level of API complexity isn't necessarily motivated yet
and can easily be a follow-on if and when its worth doing. The `Add`
code paths *are* working with the `inst_id` in order to create an
instruction if we are importing the Clang declaration. It is the
`Lookup` code paths that never needed to know about the `inst_id` and
became more confusing for having to stub it out in the API.
@github-actions github-actions bot requested a review from danakj June 26, 2025 23:31
Comment on lines 30 to 39
// Equality comparison uses the address-identity property of the Clang AST and
// just compares the `decl` pointers. We canonicalize on these to avoid ever
// having different instructions for the same declaration.
auto operator==(const ClangDecl& rhs) const -> bool {
return decl == rhs.decl;
}
// Support direct comparison with the Clang AST node pointer.
auto operator==(const clang::Decl* rhs_decl) const -> bool {
return decl == rhs_decl;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the concern raised by @geoffromer correctly, we should make ClangDecl default comparison include all the fields. I think we should be able to customize the specific store to compare only the decl field for the storage needs instead of changing ClangDecl itself.
https://discord.com/channels/655572317891461132/768530752592805919/1384999468293947537
I think that if we can provide the information of not just what is the key of the storage but also how to extract the key from the value.
This is what I've tried to do in a clunky way in #5725, but I guess I missed changing the API appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to leave comments to explain why I don't think this is needed, also discussed some in the Discord...

The instruction ID isn't part of the identity of a declaration, it is just associated data. It's value is completely determined by the declaration pointer. Comparing them both should be redundant.

Is there an added comment that would make this more clear than the current ones?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction ID isn't part of the identity of a declaration, it is just associated data. It's value is completely determined by the declaration pointer. Comparing them both should be redundant.

But this isn't an invariant of the type itself, it's an invariant of how the type is used. Relying so deeply on that invariant makes ClangDecl hard to understand in isolation -- it has to be understood in terms of its use as the value type of clang_decls (and the fact that this is the only place ClangDecl is used).

Ideally I'd prefer to decouple this type from that invariant, e.g. by teaching ValueStore how to extract decl and use it (rather than the whole ClangDecl) as a lookup and canonicalization key. However, that's probably not worth the time investment right now, and might not be worth the additional complexity in ValueStore at all.

Failing that, I think we should more explicitly document the coupling with clang_decls, and I've added a suggestion to that end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this isn't an invariant of the type itself, it's an invariant of how the type is used. Relying so deeply on that invariant makes ClangDecl hard to understand in isolation -- it has to be understood in terms of its use as the value type of clang_decls (and the fact that this is the only place ClangDecl is used).

FWIW, I think that is true regardless of the inclusion of inst_id -- I don't think this struct is reasonable to understand in isolation, I think it is fundamentally an implementation detail of the value store and how we're putting clang::Decl*s into a value store.

[...] by teaching ValueStore how to extract decl and use it (rather than the whole ClangDecl) as a lookup and canonicalization key.

The CarbonHashValue and operator== definitions, IMO, are how you teach ValueStore what to use for canonicalization.

And what this PR does is extend that to also support teaching ValueStore about this for lookup. I feel like after this PR, we have done what you're suggesting here.

But all of that is in the context of this being coupled to the value store, not some independently useful type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accepted your comment, but to your point this is really about the core of the ClangDecl class, so lifted it up to the class comment and reworded it a bit to try to make this more clear and explicit. PTAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm merging to unblock other work, but if you'd like any further changes to the comments here, happy to make them in follow-up PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment changes look good to me. My only hesitation is that as I understand it, the hash/equality customization will break if ClangDecls associated with different CanonicalValueStores are intermingled, but the comment doesn't call that out. However, in context that situation may be too unlikely to be worth commenting on.

struct ClangDeclId : public IdBase<ClangDeclId> {
static constexpr llvm::StringLiteral Label = "clang_decl_id";

using ValueType = ClangDecl;

// Use the AST node pointer directly when doing `Lookup` to find an ID.
using KeyType = clang::Decl*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @jonmeow raised a concern that we should change the ID as it could be used in multiple value stores, but perhaps just defining the key type is less of an issue.
https://discord.com/channels/655572317891461132/768530752592805919/1387492664458612938

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to hear Jon's thoughts here -- the ID type already dictates the value type, and even the reference type, so it seemed reasonable to also dictate the key type... Not sure what the looser coupling would look like?

Copy link
Contributor

@jonmeow jonmeow Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bricknerb You were putting value-specific functions here (inside the ID), I think this approach is different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note in part, I expect KeyType here is something we can deal with by template parameters overriding it. When the hash function of the value is specifically associated with the ID, that starts making it harder to replace -- I tend to think of switching types through template parameters as easier that switching functions discovered through type parameters; adding/swapping a type parameter is easy, but discovering a function through one type when the function is actually for a different type is a bit more of a design issue.

Comment on lines 30 to 39
// Equality comparison uses the address-identity property of the Clang AST and
// just compares the `decl` pointers. We canonicalize on these to avoid ever
// having different instructions for the same declaration.
auto operator==(const ClangDecl& rhs) const -> bool {
return decl == rhs.decl;
}
// Support direct comparison with the Clang AST node pointer.
auto operator==(const clang::Decl* rhs_decl) const -> bool {
return decl == rhs_decl;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction ID isn't part of the identity of a declaration, it is just associated data. It's value is completely determined by the declaration pointer. Comparing them both should be redundant.

But this isn't an invariant of the type itself, it's an invariant of how the type is used. Relying so deeply on that invariant makes ClangDecl hard to understand in isolation -- it has to be understood in terms of its use as the value type of clang_decls (and the fact that this is the only place ClangDecl is used).

Ideally I'd prefer to decouple this type from that invariant, e.g. by teaching ValueStore how to extract decl and use it (rather than the whole ClangDecl) as a lookup and canonicalization key. However, that's probably not worth the time investment right now, and might not be worth the additional complexity in ValueStore at all.

Failing that, I think we should more explicitly document the coupling with clang_decls, and I've added a suggestion to that end.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

Per #toolchain discussion, I think we'll want to change how ValueStoreTypes works, but if anything that just makes it easier for me to not mind the odd change here. :)

@chandlerc
Copy link
Contributor Author

THanks all, merging to unblock other stuff but happy to do any follow-ups particularly on the comments.

@chandlerc chandlerc added this pull request to the merge queue Jun 27, 2025
Merged via the queue into carbon-language:trunk with commit bba0377 Jun 27, 2025
8 checks passed
@chandlerc chandlerc deleted the clang-decl branch June 27, 2025 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants