Data model and reporting for OpenTelemetry workstreams#3337
Data model and reporting for OpenTelemetry workstreams#3337jack-berg wants to merge 17 commits intoopen-telemetry:mainfrom
Conversation
| steps: | ||
| - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| - name: validate workstreams.yml | ||
| run: make validate-workstreams |
There was a problem hiding this comment.
TODO: fail build if workstreams.md has uncommitted changed after re-running generation
There was a problem hiding this comment.
Maintaining a file like this allows workstreams.yml to reference only the github username of people, where as sigs.yml denormalizes it and repeats each person's name and github username everywhere they are referenced.
The downside is it can become out of date. I plan on adding a github action to check for inconsistencies with the equivalent github teams.
| | Prometheus Interoperability <a id="sig-prometheus" href="#sig-prometheus"><sup>🔗</sup></a> | Every other Wednesday at 08:00 PT | [Google Doc](https://docs.google.com/document/d/1OijO_NZyR7gb7AvUj_KxKbcbsN09XQmq_ExZeDlgEOg) | [#otel-prometheus](https://cloud-native.slack.com/archives/C01LSCJBXDZ) | [calendar-prometheus](https://groups.google.com/a/opentelemetry.io/g/calendar-prometheus) | [David Ashpole](https://github.com/dashpole) | [Pablo Baeyens](https://github.com/mx-psi) | | ||
| | Functions as a Service (FAAS) <a id="sig-faas" href="#sig-faas"><sup>🔗</sup></a> | Every other Thursday at 8:00 PT | [Google Doc](https://docs.google.com/document/d/16dGljp7YY7jgzxydDKsin8DQCCx6HQqdQHorDxTx6SQ) | [#otel-faas](https://cloud-native.slack.com/archives/C04HVBETC9Z) | [calendar-faas](https://groups.google.com/a/opentelemetry.io/g/calendar-faas) | | [Austin Parker](https://github.com/austinlparker) | | ||
| | Profiling <a id="sig-profiles" href="#sig-profiles"><sup>🔗</sup></a> | Every other Thursday at 08:00 PT | [Google Doc](https://docs.google.com/document/d/19UqPPPlGE83N37MhS93uRlxsP1_wGxQ33Qv6CDHaEp0) | [#otel-profiles](https://cloud-native.slack.com/archives/C03J794L0BV) | [calendar-profiling](https://groups.google.com/a/opentelemetry.io/g/calendar-profiling) | [Josh Suereth](https://github.com/jsuereth),<br/>[Tigran Najaryan](https://github.com/tigrannajaryan) | [Morgan McLean](https://github.com/mtwo) | | ||
| | OpenTelemetry on Mainframes <a id="sig-mainframes" href="#sig-mainframes"><sup>🔗</sup></a> | Wednesday at 10:00 PT | [Google Doc](https://docs.google.com/document/d/14p-bpofozTL4n3jy6HZH_TKjoOXvog18G1HBRqq6liE) | [#otel-mainframes](https://cloud-native.slack.com/archives/C05PXDFTCPJ) | [calendar-mainframe](https://groups.google.com/a/opentelemetry.io/g/calendar-mainframe) | [Alolita Sharma](https://github.com/alolita),<br/>[Daniel Dyla](https://github.com/dyladan),<br/>[Morgan McLean](https://github.com/mtwo) | [Morgan McLean](https://github.com/mtwo) | |
There was a problem hiding this comment.
This change illustrates an example of loose rules / conventions currently as @alolita and @mtwo are neither spec sponsors nor TC. We need to formalize the requirements of workstreams. If we want escape hatches to cover unusual situations, add allowances but let's use a standard vocabulary where sponsor means a very specific thing.
| | Specification: Logs <a id="sig-spec-logs" href="#sig-spec-logs"><sup>🔗</sup></a> | Tuesday at 10:00 PT | [Google Doc](https://docs.google.com/document/d/1BKjQWP32FXL9g1cGbyj7DMXV1Uq_RL8_78rWaMBhN0A) | [#otel-spec-logs](https://cloud-native.slack.com/archives/C062HUREGUV) | [calendar-spec-logs](https://groups.google.com/a/opentelemetry.io/g/calendar-spec-logs) | [Ted Young](https://github.com/tedsuo),<br/>[Liudmila Molkova](https://github.com/lmolkova) | [Trask Stalnaker](https://github.com/trask) | | ||
| | Semantic Conventions: General <a id="sig-semantic-conventions" href="#sig-semantic-conventions"><sup>🔗</sup></a> | Monday at 08:00 PT | [Google Doc](https://docs.google.com/document/d/10xG7DNKWRhxNmFGt3yYd3980a9uwS8lMl2LvQL3VNK8) | [#otel-semantic-conventions](https://cloud-native.slack.com/archives/C041APFBYQP) | [calendar-semconv](https://groups.google.com/a/opentelemetry.io/g/calendar-semconv) | [Armin Ruech](https://github.com/arminru),<br/>[Josh Suereth](https://github.com/jsuereth),<br/>[Liudmila Molkova](https://github.com/lmolkova) | [Trask Stalnaker](https://github.com/trask) | | ||
| | Semantic Conventions: System Metrics <a id="sig-system-metrics" href="#sig-system-metrics"><sup>🔗</sup></a> | Thursday at 07:30 PT | [Google Doc](https://docs.google.com/document/d/1p5TH57t43XpxA48onLzX4PIr3g6ydYKCtR_AUlsCnQk) | [#otel-system-metrics](https://cloud-native.slack.com/archives/C05CTFE9U4A) | [calendar-semconv](https://groups.google.com/a/opentelemetry.io/g/calendar-semconv) | [Josh Suereth](https://github.com/jsuereth) | [Pablo Baeyens](https://github.com/mx-psi) | | ||
| | Semantic Conventions: K8s <a id="sig-k8s-semconv-sig" href="#sig-k8s-semconv-sig"><sup>🔗</sup></a> | Every other Tuesday at 07:30 PT | [Google Doc](https://docs.google.com/document/d/17DqFVlLvO43neXXTwlSd1zcKjSRA8P3d0Y444QNwUTQ) | [#otel-k8s-semconv-sig](https://cloud-native.slack.com/archives/C07Q1L0FGKX) | [calendar-semconv](https://groups.google.com/a/opentelemetry.io/g/calendar-semconv) | [Josh Suereth](https://github.com/jsuereth),<br/>[Alexander Wert](https://github.com/AlexanderWert) | [Alolita Sharma](https://github.com/alolita) | |
There was a problem hiding this comment.
Another example of loose modeling. @AlexanderWert is a key figure in semantic conventions and k8s, but not a sponsor in the way we conventionally use that term.
|
@jack-berg This is very cool! I really look forward to diving into this, and thanks for taking a first crack! |
|
I did not found opentelemetry-configuration, did I miss it ? |
| # Bounded effort with defined deliverables and a finite timeline. | ||
| # Corresponds to a 'project' in /projects/*.md. | ||
| # Any kind may be associated with a project proposal document. | ||
| - working-group |
There was a problem hiding this comment.
Small terminology point, but I don't believe we have working groups? We technically only have SIGs, and have tried to avoid using the term working group except as casual conversation.
But, there does seem to be practical difference between long term maintenance and implementation SIGs – which historically have not had long-term TC sponsors – and the design SIGs and new implementation SIGs, which need TC attention. Maybe that's something we should make explicit: to conserve TC resources, a goal in setting up a long-term SIG should be that the TC can roll off to focus on other things once they determine that the maintainers understand the design problems and how the community works.
There was a problem hiding this comment.
We have something which is not a SIG, and I've heard the term working group be used several times in the past. There are 7 working groups listed in workstream.yml, which were all sourced from /projects which did not exist in sigs.yml:
- OpenTelemetry Collector Agentic Workflows
- CI/CD Observability SIG Phase 2
- Collector v1
- OpenTelemetry Ecosystem Explorer
- New Getting Started Documentation and Reference Application
- OTel Blueprints
- Bootstrap Zig Special Interest Group
Maybe that's something we should make explicit: to conserve TC resources, a goal in setting up a long-term SIG should be that the TC can roll off to focus on other things once they determine that the maintainers understand the design problems and how the community works.
I had a similar idea, but was punting on it for a followup. Something like:
- Working groups have bound deliverables, finite timelines, and always have a parent SIG
- Since they always have a parent SIG, its really that parent SIG's leadership which needs to underwrite the working groups activities. Arguably, the parent SIG should approve / reject scope, sponsor, etc. Maybe there are certain cases where the broader community has a say if its a cross functional effort affecting other SIGs.
- What working groups really need from the project process is the rails to get a calendar invite, a slack channel, and a place to gather consensus / contributors.
Historically, many working groups have been nested under the spec general SIG, or semantic conventions. These groups have been led by the TC and so its natural that their children need the approval / participation of the TC. But why should the TC care if the collector wants to spin off a new group to focus on v1? Or if communications wants to spin off a new group to work on ecosystem explorer?
In this model, which again I'm punting on for a future PR, the TC would only care about sponsoring projects under the SIGs it leads. I.e. the spec, and spec related groups like OTLP, OpAmp, declarative configuration, semantic conventions, etc.
Going back to the collector SIG and the collector v1 working group example: the TC still sponsors the top level collector SIG, and trusts it to create working groups for its own goals as needed, like the collector v1 group.
There was a problem hiding this comment.
Maybe there are certain cases where the broader community has a say if its a cross functional effort affecting other SIGs.
This has been the purpose of working groups in kubernetes. They have mostly been used to deliver complex features that involve many SIGs. We do most of that coordination through the spec, so it is a bit rarer for us to need WGs of that form.
There was a problem hiding this comment.
But why should the TC care if the collector wants to spin off a new group to focus on v1? Or if communications wants to spin off a new group to work on ecosystem explorer?
I full agree. The changes introduced last year in #2911 aimed to address that (or part of that), allowing SIGs to publicise their active priorities in an OTel Roadmap which is automatically synced from their underlying GH Projects, without having to go through the whole Project Proposal process (that's the purpose of roadmapProjectIDs in sigs.yml). The aim was to provide more clarity on roadmap management as requested by the ToC, while maintaining SIG autonomy and keeping admin streamlined (if it achieved that, I don't know).
We've used the terminology "project" (which is also confusing, because it clashes with GitHub Projects/Boards) to refer to either a) a new SIG being formed or b) a wide initiative involving multiple SIGs. However, I find this has been a source of confusion. The side-effect of this is that we have "projects" that never complete, with SIGs treating the "project" as a charter when they move into BAU, failing to establish a set of time-bound deliverables. Not every project needs a new SIG. The current project proposal template states that, if a project is driven by an existing SIG, the TC sponsor and GC liaisons should be inherited. However, I don't think this is also widely communicated.
IMO Working Groups sometimes suffered from the same problem, i.e. a working group effectively becoming a "lightweight" SIG, again, without an exit criteria. This is perhaps the key issue.
I think the term "project" would have the correct intended meaning, however because projects are represented as GitHub Projects (i.e. boards), it's also a source of confusion. I personally think a word that represents the time-bound nature of these groups would be beneficial. I'd vote in favour of using "Initiative", instead of "Project".
An Initiative can therefore be created to a) start a new SIG (with time-bound deliverables) or b) focus multiple SIGs under a single set of deliverables (this was the case for OTel Blueprints, for instance). For any other work, I think SIGs should just create GH Projects and make them publicly available as part of the roadmap, as the GC/TC sponsorship is already clear.
In short, I think Working Groups may be confusing due to historical context, and I'd be in favour of a word that represents time-bound more closely.
There was a problem hiding this comment.
We've used the terminology "project" (which is also confusing, because it clashes with GitHub Projects/Boards) to refer to either a) a new SIG being formed or b) a wide initiative involving multiple SIGs.
Naming and definitions are going to be the key output of this data modeling exercise. I suggest we take inspiration from the kubernetes governance document. Summary of group definitions:
- SIG: permanent until removed by steering committee, with horizontal, vertical, and project variants.
- Subproject: sub delineation within a SIG, with lifecycle managed by SIG.
- Working Group: organized to solve specific problem then dissolve, with emphasis on cross SIG collaboration
If we adopted this vocabulary, SIGs and working groups would likely be subject to the current project proposal process, requiring TC and GC sponsorship, where subprojects would not. This would substantially change the TC sponsorship landscape.
I'd vote in favour of using "Initiative", instead of "Project".
I used the word "initiative" in earlier versions of this branch and like it for describing some bounded unit of work.
In short, I think Working Groups may be confusing due to historical context, and I'd be in favour of a word that represents time-bound more closely.
I'm fine considering other words, but another way to clear up confusion (even from historical context) is just to write down the definition and point to that definition anytime the word is used going forward.
There was a problem hiding this comment.
Thanks. I agree. If we take these definitions
- SIG: permanent until removed by steering committee, with horizontal, vertical, and project variants.
- Subproject: sub delineation within a SIG, with lifecycle managed by SIG.
- Working Group: organized to solve specific problem then dissolve, with emphasis on cross SIG collaboration
Then we'd expect the creation of a SIG, or a Working Group, to go through the current project management process (i.e. approval, sponsorship, liaison, etc). We'd not expect a Subproject to have to go through the same process. However (outside of the scope of this PR) we should be more rigorous with:
- Making sure projects "complete", i.e. the project is the SIG creation and initial short-term goals, not a SIG charter, and the Working Group has an exist criteria. This is already codified in the project management template, but not always followed.
- In order to help with 1), all workstreams (including subproject), should report their status to the community in the form of GitHub Project updates. Again, this aligns with what's already in place.
So, in the schema proposed in this PR, roadmapProject will refer to either a new SIG creation and first deliverables (only listed under one SIG), a Working Group (listed under multiple SIGs), or a subproject (listed under one SIG). All those will report progress via the listed GH Project IDs.
@marcalff The SIG was rolled into the spec general SIG with #3297 If you're talking about a reference to the repository, I suspect there's going to be on going data quality issues with making sure all repos are properly associated with owning workstreams, but I've gone ahead and added opentelemetry-configuration as a repo to spec general. |
|
Thanks for doing this @jack-berg. I think it's a much better way to represent the relationships between these different concepts. Apart from the terminology discussed in #3337 (comment) I have a couple of questions:
|
Yes.
Once we establish the vocabulary and definitions, we can update our processes to reflect this. So if we say that there is some workstream which is fully owned and delegated by a SIG, then we would document that these groups don't have to go through the project proposal process and can directly update |


At the last GC/TC meeting, I volunteered to work on reporting to make it more clear what the TC is currently sponsoring to evaluate how new project proposals will impact attention. While starting to work on this, it became clear that we rely heavily on unstructured conventions. We have sigs.yml encoding some information about SIGs, but missing key information about TC sponsorship, projects that we didn't classify as a SIG, and relationships between SIGs.
To execute on this, I found it necessarily build a data model to codify how we think about workstreams.
Key Pieces
The data model is defined in JSON schema at workstreams.schema.yml.
workstreams.yml is a lightweight database representing the current state of our workstreams. It supersedes
sigs.yml.validate-workstreams.py verifies the that
workstreams.ymlconforms to the schema, and enforces various logical validation criteria.The SIG table from the README.md is derived from
workstreams.yml, as it was previously derived fromsigs.yml.A new workstreams.md file is generated by generate-workstream-report.py. For now, it includes a mermaid chart which visualizes the workstreams, their relationships, and other meta data, and a pivot table summarizing TC members and sponsorship. The data in this report is subject to gamification and misinterpretation, and there's a preamble warning which aims to address / discourage.
About the data model
The data model attempts to encode information which is sometimes written down but unstructured, and other times tribal knowledge we talk about and rely on but never codified. Its mean to be a starting place I hope we iterate on over time.
sigCategory, with values: specification, cross-cutting, implementation. I added this for backwards compatibility as the SIGs table in the README.md depends on this distinction. But I think its not a very useful concept and we should consider evolving to something else.parent, pointing to the id of another workstream. Working groups MUST have a parent. SIGs MAY have a parent. Workstreams form a hierarchy - i.e. a DAG where each node has at most one parent.workstreams.yml, and clarifying whether the proposal is for a SIG or working group.people[]property. This captures the key people involved in the effort, including GC liaison, TC sponsor (and sponsorship level), lead, spec sponsor. In the future, I would expand this to include a reference to the team of maintainers, and to list other teams that need to be involved / informed.resources[]property. This captures key assets like repositories, meeting note docs, communication channels and more.Expected followup work
Followup data model possibilities:
Followup data quality issues to address:
cc @open-telemetry/governance-committee, @open-telemetry/technical-committee