Skip to content

Improve testing #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

jackshirazi
Copy link
Contributor

This

  • Adds a local Elasticsearch instance
  • Starts the elastic agent as an otel collector with the traces configuration, bound to 0.0.0.0:4318
  • Adds a collector.banana endpoint pointing at localhost:4318 into the cluster to let the agents connect to the collector
  • Changes the python, nodejs and java configurations/images to generate traces from the app
  • Confirms that the traces have propagated through to Elasticsearch

So the test is now a full end-to-end trace test. The test is also setup to switch to run on both amd64 and arc64, though the arc64 option is commented for the moment

@jackshirazi jackshirazi requested a review from xrmx December 3, 2024 14:30
@jackshirazi jackshirazi requested a review from swiatekm December 13, 2024 11:02
docker run --name es01 --net elastic -p 9200:9200 -m 1GB -e xpack.security.enabled=false -e xpack.security.enrollment.enabled=false -e discovery.type=single-node docker.elastic.co/elasticsearch/elasticsearch:8.16.1 1>es.log 2>&1 &
bash test/operator/utilities/wait_for_es_http_ready.bash http://localhost:9200

- name: Start Collector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more information about why this Collector instance is needed? Why not using the Operator's endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow of trace data is app->app agent->collector->ES, so it needs a collector. The operator endpoint is for mutating the k8s definitions so that's entirely separate, that's not involved in the flow of traces

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but if you are using the values.yaml file we ship to deploy the Operator, it will deploy a collector in each K8s node with an open OTLP endpoint for traces. Those collectors will have the same processing components that the one you are launching manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah gotcha. yes that didn't seem to be working in my tests. It was easier for me to spin up a new collector with the traces config. If you think it's straightforward to have it running so that it sends traces, we can do that, I agree it would be better

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the data already has the right shape, then your collector here could just receive data and forward it to ES, if you really want to have it. It doesn't need to use this full config.

Though if it was up to me, I'd just ES itself in the K8s cluster and forward the 9200 port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collector does some transforms on the incoming data from the application agents, so I think it needs to be correctly configured for handling incoming trace data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collector does some transforms on the incoming data from the application agents, so I think it needs to be correctly configured for handling incoming trace data.

Correct, I would use the collector deployed by the Operator as it is already configured to handle/transform any OTLP traces.

Though if it was up to me, I'd just ES itself in the K8s cluster and forward the 9200 port.

That makes sense to me as well, we will just need to maintain one type of deploying infrastructure (K8s/kind).

@theletterf
Copy link
Collaborator

@AlexanderWert @mlunadia What's the purpose of this PR? It's been around for a while.

@rogercoll
Copy link
Contributor

@theletterf For context, this repository initially contained the OpenTelemetry K8s configuration (yaml) shared in Kibana's onboarding. Because of that, we added some integration tests. We end up moving the configuration to elastic-agent repository as the EDOT collector being the central piece and take advantage of the stack-version releases

@jackshirazi Java auto-instrumentation tests were added in the elastic-agent repo elastic/elastic-agent#7754 Do you think we can close this?

@jackshirazi
Copy link
Contributor Author

This is needed for 2 reasons:

  • It's primarily intended to auto-check the documentation actually works - otherwise we only have manual checks and that's definitely going to go out of sync eventually
  • we have no other tests of the full set of agents+zero-config that we support, and this is the only repo where it makes sense to do that

It's been on hold because it's not been prioritized since I last was able to work on it

@rogercoll
Copy link
Contributor

rogercoll commented May 6, 2025

we have no other tests of the full set of agents+zero-config that we support, and this is the only repo where it makes sense to do that

We recently added some integration tests in the elastic-agent repository that launch a Java app and asserts that traces are being written into Elasticsearch, all by using the K8s operator configuration used in Kibana (auto-instrumentation). See https://github.com/elastic/elastic-agent/blob/main/testing/integration/otel_helm_test.go#L104 and elastic/elastic-agent#7754

The elastic-agent's integration tests could be extended to cover other zero-config languages as well and the main benefit is that they are being run for each elastic-agent commit (EDOT collector).

@jackshirazi
Copy link
Contributor Author

The elastic-agent's integration tests could be extended to cover other zero-config languages as well and the main benefit is that they are being run for each elastic-agent commit (EDOT collector).

That's for elastic agent/collector validation. We discussed this previously, and that team doesn't want to be responsible for the full end-to-end zero-config operator support of all agents. So while there is a test framework there that could add the other agents, they (at least to my current knowledge) don't want tests failing because one of the agents has a release that causes a failure because it regresses some operator requirement. If that willingness has changed, we can certainly put the tests there

@theletterf
Copy link
Collaborator

@jackshirazi @rogercoll @AlexanderWert

I think the expectation for this repo is that it should only host documentation. Without further context, I'd say that the proper home for this logic is a different utility repository: the entire test directory should be moved elsewhere unless it's required for docs testing, which doesn't seem to be the case.

Still, would be happy to hop in a call to get a more nuanced view.

@jackshirazi
Copy link
Contributor Author

  • It's primarily intended to auto-check the documentation actually works - otherwise we only have manual checks and that's definitely going to go out of sync eventually

@theletterf
Copy link
Collaborator

@jackshirazi Thanks. I don't know how that works or what documents are checked. #216 is going to change the shape of the land quite a bit. I'd appreciate if you could walk me through this, especially if the files are meant to live here.

@rogercoll
Copy link
Contributor

That's for elastic agent/collector validation. We discussed this previously, and that team doesn't want to be responsible for the full end-to-end zero-config operator support of all agents. So while there is a test framework there that could add the other agents, they (at least to my current knowledge) don't want tests failing because one of the agents has a release that causes a failure because it regresses some operator requirement. If that willingness has changed, we can certainly put the tests there

I think their main concern is ownership, fixing tests or bumping agent's versions should be handled by the expertise team. In fact, the ingest-otel-data became codeowner of the k8s onboarding file and some integration tests for the infrastructure receivers (and Java zero-config) were added accordingly.

@swiatekm Would it make sense to extend the Helm kube-stack integration tests in the elastic-agent repository to cover all supported SDKs used in the distributed values.yaml? Maybe adding the apm team as codeowners as well?

@swiatekm
Copy link

swiatekm commented May 7, 2025

From my perspective, tests which reside in the elastic-agent repository and run on every PR in that repository, should primarily verify that changes to elastic-agent/EDOT Collector don't break any promises we've made.

If what we want to verify is that some specific set of versions of the component projects can be used to produce data that is compatible with content in Kibana, then I don't think it makes much sense for that to live in the agent repository. If anything, those tests should run on multiple agent versions, including the latest released version on each branch.

@theletterf
Copy link
Collaborator

@swiatekm @rogercoll Thanks for the additional context. Not sure if that addresses the need of having the code in this repo, though. Jack mentioned documentation testing, but I need more information on how that is supposed to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants