Skip to content

Conversation

mlenkeit
Copy link

@mlenkeit mlenkeit commented Oct 24, 2024

This PR contains a project proposal for an Audit Logging SIG as discussed on Slack.

We are aware that the project proposal still has several tbd's especially with regard to staffing and timeline that need to be defined before the SIG can start working.

We will approach other vendors directly with this proposal to identify additional contributors. Of course, anyone who comes across this proposal here on GitHub is invited to contribute.

While we do have some ideas about a potential timeline for semantic conventions, OTEL SDK/API and collector adjustments respectively, we would like to align this with other contributors first before publishing.

Any feedback from the community on the proposed scope of the SIG is highly appreciated!

Open topics

The following items reference topics from the PR discussion that are still open:

Copy link

linux-foundation-easycla bot commented Oct 24, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.


Audit Logging is currently not within the scope of OpenTelemetry

- no semantic conventions for audit logs in OTEL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- no semantic conventions for audit logs in OTEL
- There aren't currently any semantic conventions designed specifically for audit logs in OTEL

@mtwo
Copy link
Member

mtwo commented Oct 25, 2024

Are there any requirements around signing logs / detecting tampering? I've heard that mentioned before in the context of audit logs, but I don't know how common of a requirement it is


Audit logging describes the capability of capturing audit-trail relevant events of a system to meet compliance requirements. Such events may originate from the infrastructure (e.g. a Kubernetes cluster) up to the application-level. It is a capability that is particularly relevant for providers of enterprise software.

Unlike regular application logs, audit logs are usually subject to long retention periods and software providers must guarantee their completeness (i.e. guarantee of delivery).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points! In addition, these are something we might want to consider:

  1. Audit logs might be considered as a critical part of the business, which could result in a different API design strategy - for example, audit logging might require a different API behavior, if the information provided by the caller is invalid, the API might throw exception instead of failing silently and move on.
  2. Audit log might require some sensitive information without redaction due to the regulation requirements (e.g. user identity and client IP address).
  3. The data path could require higher level of access control or privilege.

Copy link
Author

@mlenkeit mlenkeit Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyang thanks for mentioning these points.

Especially the API behavior is something that we had thought about initially. However, when we first pitched audit logging on Slack, we received the following comment from Ted Young:

As a rule, the OpenTelemetry API never throws an exception. I understand why you might want this, though it is not present in many audit logging systems, which use regular loggers. So a strong case would have to be made on this particular point.

Based on this initial feedback, we decided to file this SIG proposal without proposing such API changes.

* Sponsors: tbd
* GC liaison: tbd
* Engineers:
* SAP will provide a prototype in two languages (tbd; likely two of Java, JavaScript, Go)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need prototype in two parts:

  1. API/SDK - this is where we need three programming languages IIRC.
  2. OTel Collector - higher guarantee on data delivery (completeness, integrity, latency, etc.), data path security.

Copy link
Author

@mlenkeit mlenkeit Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! It's clear to us, but I'll work on making this clearer in the doc...

@mlenkeit mlenkeit force-pushed the audit-logging-sig-project-proposal branch from c1aca6e to 65ae32e Compare November 19, 2024 13:44
@mlenkeit
Copy link
Author

Are there any requirements around signing logs / detecting tampering? I've heard that mentioned before in the context of audit logs, but I don't know how common of a requirement it is

@mtwo for all I know, immutability of audit logs is a common requirement although not all audit logging systems/use cases that I've seen address this requirement with technical measures but sometimes also organizational measures. However, given the flexibility of OTel processing queues (i.e. different topologies of collectors), having a technical solution in OTel would be favorable.

@reyang what is your opinion on this?


Audit Logging is currently not within the scope of OpenTelemetry

- no semantic conventions for audit logs in OTel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- no semantic conventions for audit logs in OTel
- no semantic conventions for audit logs in OTel

Can you provide some examples of what would be part of such semantic conventions? My knowledge on audit logs is very limited, so it would help to understand the problem much better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svrnm our experience has shown that in order to analyze audit logs at scale, it is important to define an (extensible) event catalog. The event catalog standardizes audit log events across workloads/produces. For example, our internal event catalog currently consists of 50+ such events. Ideally, such a catalog would be part of semantic conventions.

To make this more tangible, I've added some examples to the appendix of the document:
https://github.com/open-telemetry/community/pull/2409/files#diff-736e6b0ae9ae655b78d9ba007d08592071abb6cc1ef64d7893ff81642c8ec734R115-R192

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another examples from the security world is https://github.com/ocsf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mlenkeit. Makes it much clearer

The metadata looks like attributes that would be covered by other semantic conventions (e.g. there is a log.record.id for the metadata.id, the timestamp of course and some of the other ones (e.g. for k8sCluster we have k8s.cluster.name. So I would assume here it is more about re-using and extending certain other domains that are not unique to "audit logs"

For the event and data examples you gave, I would argue that they are not "semantic conventions for audit logs" but "semantic conventions for log types that typically require the strict requirements of auditing". What do I mean by that: if we talk about "semantic conventions for audit logs" I think about a namespace called audit. that holds attributes that are specific to the business logic of audit logging, like a signature that helps to tamperproof the log line, or maybe even meta information under which regulation this log is required to be an "audit log"
In contrast "semantic conventions for log types that typically require the strict requirements of auditing" are their own namespaces like the "UserLoginFailure" example would fall into a "authentication" or "auth" namespace, with "auth.login.method" or "auth.login.failureReason" as potential attributes, event.name being set to auth.login.failure or something.

I am just making those things up to exemplify the difference, they will probably take a different form or shape eventually, so to make a long story short, here is a suggestion to rephrase:

Suggested change
- no semantic conventions for audit logs in OTel
- no semantic conventions for audit logs in OTel
- no semantic conventions for log types that typically rquire the strict requirements of auditing, like authentication, authorization and data changes

@renewelches thanks for calling out OCSF, if I remember correctly there were conversations in the past between OTel and OCSF, cc @lmolkova

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding metadata, I fully agree: Most of these attributes are already covered by semconv. We may identify additional attributes in SIG meetings though, depending on the experience/requirement of other contributors/companies.

I understand how "semantic conventions for audit logs" can be misleading. To me, the suggestion that you made has a notion of particularly describe logs that are "already there" (e.g. events emitted by a K8s cluster) and can be considered relevant for audit purposes. Especially in enterprise software, it's common that applications produce logs that are specifically mean to be audit logs (and nothing else). To me, it' s important that we find wording that covers these two types that we do have.

How about the following?

Suggested change
- no semantic conventions for audit logs in OTel
- no semantic conventions for representing and identifying audit trail-relevant events in OTel (like authentication, authorization or modification of

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in another comment, this all depends on what attributes are changeable or must be immutable. As of my understanding an attribute could be altered by a processor in the collector. Which is something we would want to avoid or want to prevent in cases of audit logs. If we conclude that we can or should only guarantee immutability for the log itself then we must live with replication/doublication. Otherwise we might have to add the constrain that also certain attributes must be immutable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to looking into OCSF for security events and borrowing relevant semantic conventions from there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlenkeit security semconv SIG is also working on introducing security related attributes to the semconv, we have currently discussion around user authentication. I feel that this SIG and security SIG might have some common tasks/discussions


The workload emits the event via the OTel API/SDK. It may wait for acknowledgement of receipt from the collector before proceeding. If the event is rejected or receipt is not acknowledged in time, the workload or SDK may act accordingly, e.g. retry, rollback a database transaction, inform the user, etc.

- OTel Collector receives the event:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svrnm
Copy link
Member

svrnm commented Jan 13, 2025

any updates / progress on this?

@mlenkeit
Copy link
Author

any updates / progress on this?

@svrnm we presented the proposal in the spec SIG on Dec 3, 2024 and sparked a lively discussion. Together with @reyang, we decided to collect some more early feedback from different SIGs especially on the aspect of delivery guarantees. Over the next three weeks, I'm presenting the topic in a few language-specific SIGs and in the Collector SIG. I'll share updates here afterwards.

@mlenkeit
Copy link
Author

Status update:

  1. we've presented the proposal in the Collector, Java and JS SIG to collect more general feedback; this was rather positive
  2. we are in the process of filling the staffing gaps in the SIG proposal
    • we're trying to fill the open engineering positions from the interested vendors
    • for maintainers/approvers, we'll reach out to community members directly starting with those who engaged in the discussions from 1)

@mlenkeit
Copy link
Author

We're proposing @hilmarf as the project lead; SIG proposal has been updated accordingly. Still working on getting names for additional engineers and maintainers/approvers.

@mlenkeit mlenkeit requested a review from a team May 20, 2025 10:07
@mlenkeit
Copy link
Author

Following the discussions at KubeCon London, we've reconsidered our approach and are now following a phased approach:

  • In Phase 1 (in progress), we are building an end-to-end prototype to refine the challenges and requirements for audit logging in OTel and to showcase potential solutions. This is time-boxed until end of September 2025. We are set up to run this without a formal OTel project sign-off. We consider this truly as a proof of concept, i.e. we don't expect that the OTel modifications from the PoC will be accepted as-is and we are prepared to discard them if necessary.
  • In Phase 2, we intend to contribute functional extensions upstream back to OTel. We will work towards signing off this project proposal and either join existing SIGs or form a separate one. The results from Phase 1 should help us in the discussions with the maintainers to make our proposed OTel extensions/changes more tangible.
  • In Phase 3, we plan to work on semantic conventions for audit logging.

We have just started Phase 1 with @hilmarf and additional contributors from our side. While there isn't much yet, all our prototype efforts will be available at apeirora/audit-log-poc-for-otel.

Towards the end of Phase 1, we'll reach out to the respective SIGs to demonstrate the results of the prototype and get additional support for Phase 2 and this project proposal.

We'll update the project proposal with more specific deliverables for Phase 2 as we gain more insights during Phase 1.

@paulmbw
Copy link

paulmbw commented May 22, 2025

For those interested, I'm building a new startup in this space, specifically LLM observability for audit and compliance. Read more here: https://traceprompt-web.pages.dev/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/project-proposal Submitting a filled out project template
Projects
No open projects
Status: No Status
Development

Successfully merging this pull request may close these issues.

8 participants