Skip to content

Improve testing #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 41 additions & 6 deletions .github/workflows/operator-regression.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,11 @@ permissions:

jobs:
integration-test:
runs-on: ubuntu-latest
strategy:
matrix:
#os: [observability-linux-2-arm64, ubuntu-latest]
os: [ubuntu-latest]
runs-on: ${{ matrix.os }}
steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -23,6 +27,33 @@ jobs:
run:
bash test/operator/kind-with-registry.sh

- name: Start Elasticsearch
run: |
docker network create elastic
#Local elasticsearch without SSL on http://localhost:9200
docker run --name es01 --net elastic -p 9200:9200 -m 1GB -e xpack.security.enabled=false -e xpack.security.enrollment.enabled=false -e discovery.type=single-node docker.elastic.co/elasticsearch/elasticsearch:8.16.1 1>es.log 2>&1 &
bash test/operator/utilities/wait_for_es_http_ready.bash http://localhost:9200

- name: Start Collector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more information about why this Collector instance is needed? Why not using the Operator's endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow of trace data is app->app agent->collector->ES, so it needs a collector. The operator endpoint is for mutating the k8s definitions so that's entirely separate, that's not involved in the flow of traces

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but if you are using the values.yaml file we ship to deploy the Operator, it will deploy a collector in each K8s node with an open OTLP endpoint for traces. Those collectors will have the same processing components that the one you are launching manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah gotcha. yes that didn't seem to be working in my tests. It was easier for me to spin up a new collector with the traces config. If you think it's straightforward to have it running so that it sends traces, we can do that, I agree it would be better

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the data already has the right shape, then your collector here could just receive data and forward it to ES, if you really want to have it. It doesn't need to use this full config.

Though if it was up to me, I'd just ES itself in the K8s cluster and forward the 9200 port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collector does some transforms on the incoming data from the application agents, so I think it needs to be correctly configured for handling incoming trace data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collector does some transforms on the incoming data from the application agents, so I think it needs to be correctly configured for handling incoming trace data.

Correct, I would use the collector deployed by the Operator as it is already configured to handle/transform any OTLP traces.

Though if it was up to me, I'd just ES itself in the K8s cluster and forward the 9200 port.

That makes sense to me as well, we will just need to maintain one type of deploying infrastructure (K8s/kind).

run: |
#Elastic agent from download instructions
curl -L -O https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.16.1-linux-x86_64.tar.gz
tar xzvf elastic-agent-*-linux-x86_64.tar.gz
cd elastic-agent-*-linux-x86_64/
#Use latext collector config for elastic agent
curl -L -O https://raw.githubusercontent.com/elastic/elastic-agent/refs/heads/main/internal/pkg/otel/samples/linux/logs_metrics_traces.yml
#bind to ANY rather than LOOPBACK so that the pods can connect
sed -e 's/http:/http:\n endpoint: 0.0.0.0:4318/' logs_metrics_traces.yml > logs_metrics_traces2.yml
export STORAGE_DIR=/var/tmp
export ELASTIC_ENDPOINT=http://localhost:9200
export ELASTIC_API_KEY=nothing
sudo -E ./elastic-agent otel --config logs_metrics_traces2.yml 1>collector.log 2>&1 &
bash ../test/operator/utilities/wait_for_log_entry.bash collector.log "elastic agent collector" "Everything is ready"

- name: Test Collector
run: |
echo "Nothing here yet"

- name: Create Test Images
run: |
for t in ${AGENT_TESTS[@]}
Expand All @@ -36,7 +67,7 @@ jobs:
- name: Set up Helm
uses: azure/setup-helm@v4
with:
version: v3.11.2
version: v3.13.3

- name: Install Operator
run: |
Expand All @@ -47,20 +78,24 @@ jobs:
bash test/operator/wait_for_pod_start.sh opentelemetry-operator-system opentelemetry-operator 2/2 1
kubectl get pods -A

- name: Start And Test Collector Skeleton
run: |
echo "Nothing here yet"

- name: Start Test Images
run: |
HOST_IP=$(hostname -I | awk '{print $1}')
echo "using host ip $HOST_IP"
sed -e "s/REPLACE/$HOST_IP/" test/operator/utilities/endpoint.yml > endpoint.yml
kubectl create namespace banana
kubectl apply -f test/operator/utilities/dummyservice.yml
kubectl apply -f endpoint.yml
for t in ${AGENT_TESTS[@]}
do
if [ "x$t" = "xgo" ]; then CONTAINER_READY="2/2"; else CONTAINER_READY="1/1"; fi
AGENT_START_GREP=`grep -A1 AGENT_HAS_STARTED_IF_YOU_SEE test/operator/$t/test-app.yaml | perl -ne '/value:\s*"(.*)"/ && print "$1\n"'`
echo "Starting pod for $t"
kubectl create -f test/operator/$t/test-app.yaml
bash test/operator/wait_for_pod_start.sh banana $t-test-app $CONTAINER_READY 1
kubectl logs pod/$t-test-app -n banana
bash test/operator/wait_for_agent_start.sh banana $t-test-app "$AGENT_START_GREP"
bash test/operator/utilities/wait_for_es_transaction.bash http://localhost:9200 $t-test-app "kubectl logs pod/$t-test-app -n banana"
kubectl logs pod/$t-test-app -n banana
kubectl delete -f test/operator/$t/test-app.yaml
done
25 changes: 0 additions & 25 deletions test/operator/elastic-instrumentation.yml

This file was deleted.

4 changes: 4 additions & 0 deletions test/operator/java/test-app.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,7 @@ spec:
value: "true"
- name: AGENT_HAS_STARTED_IF_YOU_SEE
value: "javaagent.tooling.VersionLogger - opentelemetry-javaagent"
- name: OTEL_INSTRUMENTATION_METHODS_INCLUDE
value: "Hello[test]"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://collector:4318"
3 changes: 2 additions & 1 deletion test/operator/nodejs/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
FROM node

ADD ./app.js .
ADD ./start.bash .

ENTRYPOINT [ "node", "app.js" ]
ENTRYPOINT [ "bash", "start.bash" ]
7 changes: 7 additions & 0 deletions test/operator/nodejs/start.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
node app.js &
while :
do
sleep 1
curl -s http://127.0.0.1:8080/ > /dev/null
done
2 changes: 2 additions & 0 deletions test/operator/nodejs/test-app.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,5 @@ spec:
value: "debug"
- name: AGENT_HAS_STARTED_IF_YOU_SEE
value: "@opentelemetry/instrumentation-http Applying instrumentation patch for nodejs core module"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://collector:4318"
4 changes: 2 additions & 2 deletions test/operator/python/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
FROM python:3.9-slim-buster
FROM python:3.12

WORKDIR /app

COPY . /app

RUN pip install --no-cache-dir -r requirements.txt

CMD [ "python3", "-m" , "flask", "run"]
CMD [ "bash", "/app/start.bash"]
7 changes: 7 additions & 0 deletions test/operator/python/start.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
python3 -m flask run &
while :
do
sleep 1
curl -s http://localhost:5000/ > /dev/null
done
4 changes: 3 additions & 1 deletion test/operator/python/test-app.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,6 @@ spec:
- name: OTEL_LOG_LEVEL
value: "debug"
- name: AGENT_HAS_STARTED_IF_YOU_SEE
value: "Exception while exporting metrics HTTPConnectionPool"
value: "OpenTelemetry Lambda extension"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://collector:4318"
9 changes: 9 additions & 0 deletions test/operator/utilities/dummyservice.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v1
kind: Service
metadata:
name: collector
namespace: banana
spec:
clusterIP: None
ports:
- port: 4318
11 changes: 11 additions & 0 deletions test/operator/utilities/endpoint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: v1
kind: Endpoints
metadata:
name: collector
namespace: banana
subsets:
- addresses:
- ip: REPLACE
ports:
- name: collector
port: 4318
22 changes: 22 additions & 0 deletions test/operator/utilities/wait_for_es_http_ready.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

set -euxo pipefail

MAX_WAIT_SECONDS=120
URL=$1

echo "Waiting up to $MAX_WAIT_SECONDS seconds for the elasticsearch server to be ready by checking $URL"
count=0
while [ $count -lt $MAX_WAIT_SECONDS ]
do
count=`expr $count + 1`
STARTED=$((curl -m 2 "$URL" || true) | (grep build_hash || true) | wc -l)
if [ $STARTED -ne 0 ]
then
exit 0
fi
sleep 1
done
echo "error: the elasticsearch server failed to be ready within $MAX_WAIT_SECONDS seconds"
curl -v "$URL"
exit 1
28 changes: 28 additions & 0 deletions test/operator/utilities/wait_for_es_transaction.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash

set -euxo pipefail

MAX_WAIT_SECONDS=120
URL=$1
SERVICE_NAME=$2
KUBECTL_COMMAND=$3

echo "Waiting up to $MAX_WAIT_SECONDS seconds for the elasticsearch server to show a transaction from $SERVICE_NAME by querying $URL"
count=0
while [ $count -lt $MAX_WAIT_SECONDS ]
do
count=`expr $count + 1`
#curl -m 2 "$URL/traces*/_search" -H "Content-Type: application/json" -d '{"query": {"range": {"@timestamp": {"gte": "now-1h","lte": "now"}}}}' > query.output
curl -m 2 "$URL/traces*/_search" -H "Content-Type: application/json" -d "{\"query\": {\"bool\": {\"must\": [{\"range\": {\"@timestamp\": {\"gte\": \"now-1h\",\"lte\": \"now\"}}},{\"match\": {\"resource.attributes.service.name\": \"$SERVICE_NAME\"}}]}}}" > query.output
DETECTED_SERVICE=$(jq '.hits.hits[0]._source.resource.attributes."service.name"' query.output | tr -d '"')
if [ "x$DETECTED_SERVICE" = "x$SERVICE_NAME" ]
then
exit 0
fi
sleep 1
done

echo "error: the elasticsearch server failed to include a transaction with the service name $SERVICE_NAME wihin $MAX_WAIT_SECONDS seconds"
eval $KUBECTL_COMMAND
cat query.output | jq
exit 1
24 changes: 24 additions & 0 deletions test/operator/utilities/wait_for_log_entry.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash

set -euxo pipefail

MAX_WAIT_SECONDS=120
LOGNAME=$1
SERVICENAME=$2
GREP=$3

echo "Waiting up to $MAX_WAIT_SECONDS seconds for the $SERVICENAME to be ready"
count=0
while [ $count -lt $MAX_WAIT_SECONDS ]
do
count=`expr $count + 1`
STARTED=$((grep -i "$GREP" $LOGNAME || true) | wc -l)
if [ $STARTED -ne 0 ]
then
exit 0
fi
sleep 1
done
echo "error: the $SERVICENAME failed to be ready within $MAX_WAIT_SECONDS seconds"
tail -n 100 $LOGNAME
exit 1