Pre-RFC: Tracing module - implement other providers #1919
Replies: 9 comments 1 reply
-
Hey @scottgerring, thanks for opening this issue. This is a really cool idea. In Python, we have something similar for the Metrics utility already (https://docs.powertools.aws.dev/lambda/python/latest/core/metrics/datadog/). With the refactoring of the metrics module in Before we move on to an RFC stage, let me triage this with the team and get back to you with more details (by next week). |
Beta Was this translation helpful? Give feedback.
-
Hey @phipag thanks for the quick turnaround! |
Beta Was this translation helpful? Give feedback.
-
Hey @phipag , did you folks have a chat about this? 👼 |
Beta Was this translation helpful? Give feedback.
-
Hi @scottgerring! Long time no see my friend. I hope life and family are going well. By the way, thank you so much for all the hard work you put into Java v2 branch before you left AWS. It paved the way for @phipag to get there. Let me give you some additional details about the decisions we’ve made in the past that will drive other decisions in the future. 1/ Powertools add support for third-party observability providers Two years ago, we decided to start supporting third-party oy11 providers, primarily because we knew customers were using providers other than CloudWatch for a variety of reasons: because it’s part of their foundation, because they want to have their data in one place, and for a thousand other reasons customers have. And our first bet was to add support for sending metrics directly to DataDog in Powertools Python. This decision was made mainly because I had some previous experience with DataDog Extension/Forward, which helped me understand edge cases and the best ways to implement them. To be honest, I think this was a great implementation and we see some customers using it and being happy with it. 2/ Datadog, NewRelic, HoneyComb, AppDynamics, and dozen of other providers I remember a discussion we had back then: OK, now that we’ve added support for DataDog, and customers are going to ask for NewRelic, HoneyComb, AppDynamics, and dozen of other observability tools, how do we support all of these integrations? And who is responsible for adding and maintaining this code? Sure, it’s Powertools code, but we recognize that we may not have enough knowledge about each provider. 3/ Why not a standard At that time, OpenTelemetry was growing and becoming a standard for some types of workloads, but we must recognize that this new standard was still in the adoption phase, especially since Tracer was the first stable/GA API in OTEL, but at that time Metrics had recently went to GA and Logs/Events was in RC phase. Additionally, customers were still figuring out how to do things the right way with OTEL and Lambda. Additionally, oy11 providers were still making OTLP endpoints stable and the recommended way to go. 4/ Looking to the Future Given this scenario and changes we had in the last 2 years, we are evaluating new opportunities for the future. While we will maintain our standard user experience using CloudWatch + XRAY, why not adopt OTEL in Powertools with the same experience we have today and make it a standard that will implement this protocol and let customers decide where to send this data? If I am not mistaken, this is the recommended way for Datadog to send traces and may be the recommended way for other providers as well. I am not saying that this is the final decision and we may not be open to implementing a direct integration with DataDog, but we are considering what is best for the future for our customers. I would like to take this opportunity to talk to you about some of the challenges with OTEL and Lambda and how you see this experience from the developer side when sending data to DataDog using OTEL + Lambda. Pls let me know if you're open to schedule a meeting and discuss more about this. Leandro |
Beta Was this translation helpful? Give feedback.
-
Hey @leandrodamascena lovely to hear from you again! It's been a while :) I think this your take is fundamentally clear-eyed; if we can do it with OTel and support everything, that's great. The issue with OTel with Java in the past has been that the cold-start impact is significant. I'm not sure if this is still the case or not - but certainly worth validating! Here we'd also want to make sure that it's not introducing CRaC issues as I believe @phipag is working on making pt-java play well here.
Folks can use OTel/OTLP or they can use the datadog instrumentation Chucked a meeting in your calendars - let's chat next week ! |
Beta Was this translation helpful? Give feedback.
-
Absolutely, we already have a PR opened by a contributor #1861. Is there anything specific you have in mind that would introduce CRaC issues? |
Beta Was this translation helpful? Give feedback.
-
Nope! just flagging that we should keep an eye on it :D |
Beta Was this translation helpful? Give feedback.
-
I had a play with this this morning. I think we should:
It seems that x-ray is migrating to using OTel as its API as well, which makes thing more interesting! It looks like you can do auto-instrumentation with Lambda too, which is the same sort of mechanism Datadog can use to make things easy. The destination-specific configuration becomes glue, and working out how to handle this is going to be important:
So, something like this:
I have test code for the API-first way with Datadog and associated CDK setup I can share after we chat about what we want to do! |
Beta Was this translation helpful? Give feedback.
-
Hey @scottgerring thanks for sharing some thoughts, that's a thing that I'm working in this exactly moment and I really appreciate that. We basically have two distinct worlds in OTEL and OTEL+Lambda in general: auto-instrumentation and manual instrumentation. I'm trying to find the right balance between the two worlds and provide a good experience by taking all the burden off the OTEL API - which is super complex - but also allowing the customer to do whatever customizations they want. Finding that middle ground is hard LOL. Our main idea is not to remove support for XRay or discontinue it, but to create a new provider that will have the same experience but will send data to OTEL. We cannot remove support for XRay because even though XRay supports data ingestion through the OTLP endpoint, the dependencies are different, the infrastructure changes (needs collector and layer) and this could break existing customers.
I have this code working in Python with auto-instrumentation using this: https://aws-otel.github.io/docs/getting-started/lambda/lambda-python. I use existing span segment, that is the def capture_lambda_handler(
self,
lambda_handler: Callable | None = None,
) -> Callable[..., Any]:
@functools.wraps(lambda_handler)
def decorate(event, context, **kwargs):
span = trace.get_current_span()
lambda_handler_name = lambda_handler.__name__
try:
logger.debug("Calling lambda handler")
response = lambda_handler(event, context, **kwargs)
logger.debug("Received lambda handler response successfully")
except Exception as err:
logger.exception(f"Exception received from {lambda_handler_name}")
raise
finally:
cold_start = _is_cold_start()
logger.debug("Annotating cold start")
span.set_attribute(key="ColdStart", value=cold_start)
return response
return decorate ![]()
This is hard to say. Because the destination matters if customers are not using collectors inside Lambda. I agree that this should be the default experience: Lambda code -> Send to collector endpoint -> Destination. But this is sometimes not what customers are doing, as customers are exporting the JSON created by the OTEL SDK and then aggregating it using Kinesis, for example. Or customers are using the export directly in their code with a sync http call + batch. I come back here to my initial point of the discussion: auto-instrumentation or manual instrumentation. If we decide to support manual instrumentation - which for me make 100% sense - we need to allow the customer to configure the tracer instance with the export they want, something like this: from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
# Set up the tracer with Console exporter
tracer_provider = TracerProvider()
tracer_provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(tracer_provider)
# Get a tracer for your module
tracer = trace.get_tracer(__name__)
I'm not sure. I was reading the Datadog documentation and it seems like we need to instrument the code to send the tracers. But ofc I might be missing something and not understanding this. I need to better understand how we can support most providers with the same API/experience.
Both the Datadog layer and the ADOT layer work the same way: they auto-instrument third-party libraries. The ADOT layer also give us with the exceptions, stack traces, and attributes for this for free, and I imagine Datadog does too.
Our idea is not to change the current tracing module, but to create a new one called powertools-tracing-otlp with specific dependencies or something like that. This is another point we need to think about. We don't necessarily need to bring these dependencies if customers are using the ADOT layer, because it bring them in their layer and we can help the customer to reduce the size of the lambda package. But yes, customers must have those dependencies.
Nice, this is amazing! We can talk more about this in our meeting. As always, thank you so much for sharing your knowledge and helping us get there. I'm super excited about this integration and we definitely intend to support Datadog from day one. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Pre-RFC: Datadog Tracer Provider for Powertools-Java
Hi team! 👋 Before drafting a full RFC, I wanted to gauge interest in adding a Datadog tracer implementation to Powertools for AWS Lambda (Java).
Why?
This would provide Powertools+Datadog customers the ability to go beyond datadog's serverless layer, instrumenting downstream calls and introducing custom spans whilst relying on the ease of use of Powertools for Java.
High-level idea
PowertoolsTracer
).powertools-tracing-core
powertools-tracing-xray
(current impl, default)powertools-tracing-datadog
(new impl)Next steps
Would this addition be of interest to the maintainers? Happy to flesh out details if the direction sounds reasonable!
Scott
Beta Was this translation helpful? Give feedback.
All reactions