@@ -23,17 +23,66 @@ in your ``values.yaml``:
23
23
24
24
.. code-block :: yaml
25
25
26
- tracing_sampling_rate : 0.01
26
+ envoy :
27
+ tracing_sampling_rate : 0.01
27
28
28
29
opentelemetry-collector :
29
30
enabled : true
31
+ config :
32
+ exporters :
33
+ otlp :
34
+ endpoint : http://supersonic-tempo:4317
35
+ otlphttp :
36
+ endpoint : http://supersonic-tempo:4318
37
+ prometheusremotewrite :
38
+ endpoint : http://supersonic-prometheus-server:9090/api/v1/write
30
39
31
40
tempo :
32
41
enabled : true
42
+ tempo :
43
+ metricsGenerator :
44
+ enabled : true
45
+ remoteWriteUrl : http://supersonic-prometheus-server:9090/api/v1/write
46
+
47
+ .. note ::
48
+
49
+ In the example above, endpoints and remote write URLs are configured to point to
50
+ the Prometheus server and Grafana Tempo services, which will most likely
51
+ have names like ``<release-name>-prometheus-server `` and ``<release-name>-tempo ``.
33
52
34
53
The ``tracing_sampling_rate `` parameter controls how frequently requests are
35
54
traced. A value of ``0.01 `` means that one in 100 requests will be traced.
36
55
56
+ Additionally, you will need to enable tracing in Triton Inference Server, which is
57
+ done by passing additional flags to ``tritonserver `` command. The following example
58
+ shows how tracing is configured for CMS SuperSONIC instance:
59
+
60
+ .. code-block :: bash
61
+
62
+ /opt/tritonserver/bin/tritonserver \
63
+ --model-repository=/cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre7/external/el9_amd64_gcc12/data/RecoBTag/Combined/data/models/ \
64
+ --model-repository=/cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre7/external/el9_amd64_gcc12/data/RecoTauTag/TrainingFiles/data/DeepTauIdSONIC/ \
65
+ --model-repository=/cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre7/external/el9_amd64_gcc12/data/RecoMET/METPUSubtraction/data/models/ \
66
+ --model-repository=/cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre7/external/el9_amd64_gcc12/data/RecoEgamma/EgammaPhotonProducers/data/models/ \
67
+ --trace-config mode=opentelemetry \
68
+ --trace-config=opentelemetry,resource=pod_name=$( hostname) \
69
+ --trace-config opentelemetry,url=supersonic-opentelemetry-collector:4318/v1/traces \
70
+ --trace-config rate=100 \
71
+ --trace-config level=TIMESTAMPS \
72
+ --trace-config count=-1 \
73
+ --allow-gpu-metrics=true \
74
+ --log-verbose=0 \
75
+ --strict-model-config=false \
76
+ --exit-timeout-secs=60
77
+
78
+ .. note ::
79
+
80
+ In the example above, the url should point to the OpenTelemetry Collector service,
81
+ which will most likely have a name ``<release-name>-opentelemetry-collector ``.
82
+
83
+ For tracing in Triton, the rate is the inverse of the ``tracing_sampling_rate ``
84
+ parameter in the Envoy Proxy configuration: rate=100 means 1% of requests will be traced.
85
+
37
86
.. warning ::
38
87
39
88
Triton Inference Server supports OpenTelemetry tracing only in versions 24.x or later.
@@ -43,7 +92,36 @@ Displaying Tracing Data in Grafana
43
92
44
93
If Grafana is enabled in your ``values.yaml ``, you can display the tracing data
45
94
in the Grafana dashboard. In order to achieve this, Grafana needs to have a
46
- Tempo datasource configured.
95
+ Tempo datasource configured:
96
+
97
+ .. code-block :: yaml
98
+
99
+ grafana :
100
+ enabled : true
101
+ datasources :
102
+ datasources.yaml :
103
+ datasources :
104
+ - name : prometheus
105
+ type : prometheus
106
+ access : proxy
107
+ isDefault : true
108
+ url : http://supersonic-prometheus-server:9090
109
+ jsonData :
110
+ timeInterval : " 5s"
111
+ tlsSkipVerify : true
112
+ - name : tempo
113
+ type : tempo
114
+ url : http://supersonic-tempo:3100
115
+ access : proxy
116
+ isDefault : false
117
+ basicAuth : false
118
+ jsonData :
119
+ timeInterval : " 5s"
120
+ tlsSkipVerify : true
121
+ serviceMap :
122
+ datasourceUid : " prometheus"
123
+ nodeGraph :
124
+ enabled : true
47
125
48
126
If OpenTelemetry Collector and Tempo are enabled, the default Grafana dashboard
49
127
will include an interactive server map, where you can study tracing data in detail
0 commit comments