-
Notifications
You must be signed in to change notification settings - Fork 60
Add imports section in semconv to support references on attribute_groups/metrics/events/entities from custom registries
#769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
crates/weaver_resolver/data/multi-registry/custom_registry/custom_registry.yaml
Show resolved
Hide resolved
| requirement_level: required | ||
|
|
||
| imports: | ||
| - metric_ref: metric.example_counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another branch of the slack discussion:
we probably want wildcard imports and cover attribute namespaces rather than groups.
E.g.
imports:
- attribute.db.*
- attribute.server.*
- attribute.error.*
- attribute.http.*
- attribute.url.*syntax allows users to allowlist attributes in bulk without importing everything or explicitly allowing each attribute.
Attribute groups are meaningless - they are grouping mechanism and we move attributes between them a lot.
there are not stability guarantees for attribute groups.
A similar syntax can be useful for bulk-importing metrics, entities, events
imports:
- metric.db.*
- metric.http.*
- metric.system.*
- entity.host.*
- ...Importing each attribute/span/event/metric/entity one-by-one will be very verbose and hard to use. Bulk import seems to be the main use-case.
crates/weaver_semconv/src/group.rs
Outdated
| MetricRef { | ||
| /// The ID of the metric group being referenced in the imported registry. | ||
| metric_ref: String, | ||
| // Additional overridable fields may be added in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to https://github.com/open-telemetry/weaver/pull/769/files#r2131292750 (bulk import)
importing without modification makes sense, but it would override the original signal definition without any means to distinguish imported from the original one.
e.g. in service A I report http.client.request.duration with url.template when it talks to internal services B and C.
Service B reports http.client.request.duration without url.template because it's not supported or possible.
How do I document these caveats if I have a singleton http.client.request.duration definition?
the proposal:
importsallow to import without modification - original definitions are fully frozen and immutable. Originaliddescribes only that signal and not its flavor.- supported modifications are done through existing
extendsmechanism or something similar. I.e.
- id: my.service_a.metric.http.client.request.duration
extends: metric.http.client.request.duration
note: this a service a flavor of http.client.request.duration, it records url.template when talking to internal services
attributes:
- ref: url.template
requirement_level:
conditionally_required: SHOULD be recorded when calling service B or C
- id: my.service_b.metric.http.client.request.duration
extends: metric.http.client.request.duration
note: this a service b flavor of http.client.request.duration, it does not include url.template attribute since we can't record it there.I.e. each modified version needs a unique id that can be used to render docs, generate code, build dashboards, run live checks.
We'd need to change what extends does for telemetry signals. Today it just inherits attributes and common group properties, but it can become stricter and require to preserve stability, name, type, etc.
From my point of view, having modifiable, but singleton definition is narrow and confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we don't need signal-specific refs for import - they won't have signal-specific properties.
I.e. we can have a nicer syntax like
imports:
- metric.db.*
- metric.http.*
- metric.system.*
- entity.*
- ...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board with this direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe can should rename attribute id to name?
In any case, if we do
imports:
- attribute.db.*
- metric.db.*it'll be confusing for authors because attribute. prefix doesn't exist anywhere
if we do
imports:
- db.*
- metric.db.*it'll be confusing to authors and readers since it's not obvious that db.* only imports attributes.
I'm trying to create non-ambigous and consistent way for those who read or write conventions, with the
- imports:
- attributes:
- http.*
- db.*
- system.*
- metrics:
- http.*
- db.*
- system.*
- entities:
- hostthey import by over-the-wire identifier they use in queries/dashboards. The id becomes internal semconv impl detail it should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lquerel - What's the extent of the wildcard capability? Can we have *.db.* for example? Or, for people not using namespace dot notation, db*?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerbly everything supported by globset is effectively supported by this imports section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- imports:
- attributes:
- http.*
- db.*
- system.*
- metrics:
- http.*
- db.*
- system.*
- entities:
- host
Unless I’m misinterpreting something in your previous messages, I see more or less the same level of confusion with this syntax. For attributes, the list under attributes contains IDs, while the list under metrics contains names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my suggestion you import by the formal/public name. You never see metric.db.... unless you look into semconv repo, it's internal impl detail. The fact we use different properties to represent id inside semconv is sad.
How do you want to solve 'import attribute' problem otherwise?
refactoring_of_group_spec.patch
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to include this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerbly What I’m saying is that the glob syntax used to match signal and attribute IDs (or names, depending on the final version) is the same as the one used in the globset crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that... this comment above was a file level comment for the patch file you've included. Is that a mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, my bad. I need to remove this patch from the repo. That was the major refactoring I did to completely change GroupSpec from a struct to an enum. I was planning to create a separate PR with the content of this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks
crates/weaver_semconv/src/semconv.rs
Outdated
|
|
||
| /// A list of imports referencing groups defined in a dependent registry. | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub(crate) imports: Option<Vec<GroupWildcard>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should document what import does:
- it is a way to allowlist external specific semconv in yours (that's my understanding, not sure if it's a common one)
- when importing a group, does it also import all attributes the group depends on and all entities it's associated with? or do I need to explicitly import such attribute before using it in my custom signals?
- is there a way to say "I allow all otel semconv"
imports: ["*"]? - what do I do with imported things after? generate docs? - we need an example :)
crates/weaver_semconv/src/semconv.rs
Outdated
|
|
||
| /// Returns the list of imports in the semantic convention spec. | ||
| #[must_use] | ||
| pub fn imports(&self) -> Option<&[GroupWildcard]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(random place)
should we warn/fail when someone tries to import a thing that's in the same registry? it doesn't make sense and people would only do this by mistake, we should probably notify them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about giving this kind of user's feedback, but actually it’s not so straightforward with the wildcard system. For example, I might have an import like metric.db.* that imports all the OTEL database metrics, but at the same time have a local metric declaration that matches this wildcard.
So, it seems clear that we should build something robust, except perhaps for imports without wildcards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall we discussed imports from multiple registries and that we'd need to add registry as the namespace.
E.g. if I have OTel registry and Foo registry, when I import metric.db.... (or db....) I actually implicitly import from otel:: because it's special to us.
So when a collision happens, I should be able to resolve it by adding registry foo::db.....
I don't like the idea of
- imports
- metrics:
db.*importing all db metrics from all registries I have without any means to control it and all the collisions that come with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could allow db.* only if there is no ambiguity, as with references, and when there is ambiguity, require the custom registry author to specify the registry alias, like otel:db.*. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho the best way to prevent ambiguity is to make people explicitly mention where they import from.
It's ok if we have one registry as a default one or if you can set it.
When I write imports: ['db.foo.bar'] and I have two registries I import from, it should be deterministic where db.foo.bar comes from - it comes from the default one. You can only import metric defined in non-default registry using another_registry:db.foo.bar.
So if you want to import from all registries, write imports: ['*:db.foo.bar'], weaver should not try to be too smart.
i.e. the syntax should minimize the chance of ambiguity, runtime check is the last resort
|
During the semconv tooling SIG, we decided to represent the Right now, the "name" of the persistant identifiers across the various group types is non fully consistent. We have a plan to rename |
imports section in semconv to support references on metrics/events/entities from custom registriesimports section in semconv to support references on attribute_groups/metrics/events/entities from custom registries
|
Are we ready to switch to the auto-generated json schema for use in editors? There is a trade-off that I'm not sure we've agreed on. The manual schema encodes some of the rules which shifts-left the warnings for a better experience for the author. For example:
My concern is that this may be a backwards step for authors using the schema which is highly recommended for a faster in-line experience. Their editor will show no errors and then weaver will complain, leading to a much slower working loop. |
I thought we had decided to switch, but I agree with your argument. I'll roll back this file. |
|
@jerbly I roll backed the JSON schema to the manual version and updated it with the new |
|
@jerbly I also made a small update to |
jerbly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #769 +/- ##
=======================================
+ Coverage 76.8% 76.9% +0.1%
=======================================
Files 69 69
Lines 5634 5666 +32
=======================================
+ Hits 4328 4359 +31
- Misses 1306 1307 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Liudmila Molkova <[email protected]>
[Updated to reflect conclusions of the thread below]
This PR removes one of the current limitations of custom registries. Previously, when the
--include-unreferencedoption was not enabled, Weaver did not include signals from imported registries in the resolved output. This PR introduces the ability to explicitly import specific groups by name or wildcard from an imported registry.A new top-level imports field, composed of four lists (attributes, metrics, events, entities), has been added to the semconv YAML format. This allows references to groups of type
attribute_group,metric,event, andentity. References to spans are not yet supported, but could be added in the future if their names become stable and persistent across registry versions.Support for overriding groups through this reference mechanism could also be introduced in a future PR.
Example of custom registry: