Skip to content

feat: add agent config options for internal telemetry and health check, enable use of status cmd in kubernetes #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 15, 2025

Conversation

obs-gh-mattcotter
Copy link
Collaborator

@obs-gh-mattcotter obs-gh-mattcotter commented Mar 27, 2025

Description

Add agent config options for internal telemetry and health check. Enable the use of the agent status command in kubernetes. Also upgrade the internal telemetry config as our config for metrics is no longer supported: https://github.com/open-telemetry/opentelemetry-collector/releases/tag/v0.123.0

Ex:

$ kubectl -n observe exec --stdin --tty observe-agent-node-logs-metrics-agent-brjwr -- /observe-agent --observe-config=/observe-agent-conf/observe-agent.yaml status
Defaulted container "node-logs-metrics" out of: node-logs-metrics, kube-cluster-info (init)
================
Agent
================

  Host Info
  ================
  HostID: 45220b39-2c66-47bf-984d-9d0bbae88222
  Hostname: observe-agent-node-logs-metrics-agent-brjwr
  BootTime: 2025-04-10T20:05:46Z
  Uptime: 20h32m12s
  OS: linux
  Platform: alpine
  PlatformFamily: alpine
  PlatformVersion: 3.21.3
  KernelArch: aarch64
  KernelVersion: 6.10.14-linuxkit

  Agent Metrics
  ================
  ExporterQueueSize: 0
  CPUSeconds: 1.19s
  MemoryUsed: 132.86328MB
  TotalSysMemory: 60.27076MB
  Uptime: 17.37295s
  AvgServerResponseTime: 0ms
  AvgClientResponseTime: 0ms

    Logs Stats
    ================
    ReceiverAcceptedCount: 18
    ReceiverRefusedCount: 0
    ExporterSentCount: 10
    ExporterSendFailedCount: 0

    Metrics Stats
    ================
    ReceiverAcceptedCount: 601
    ReceiverRefusedCount: 0
    ExporterSentCount: 601
    ExporterSendFailedCount: 0

    Traces Stats
    ================
    ReceiverAcceptedCount: 0
    ReceiverRefusedCount: 0
    ExporterSentCount: 0
    ExporterSendFailedCount: 0

  Agent Health
  ================
  Status: Running
  TotalRefusedCount: 0
  TotalSendFailedCount: 0

)

const (
TelemetryEndpointFlag = "telemetry-endpoint"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually the more useful use case here is that we could then provide overrides in the config.yaml we generate in the helm chart. so we could add a telemetry_endpoint: "{{ template "config.local_host"}}:8888" and that would work correctly. Since that's the case, I'd prefer if we stick to the snake_case style of naming instead of kebabcase and maybe we can nest these under something like endpoints::telemetry_endpoint

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I agree! I will update this PR after my config refactor to simplify the rebasing.

@obs-gh-mattcotter obs-gh-mattcotter changed the title feat: add flags to set non-default endpoints for status command to enable use in kubernetes feat: add agent config options for internal telemetry and health check, enable use of status cmd in kubernetes Apr 11, 2025
},
}

func printAllConfigsIndividually(configFilePaths []string) error {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all code motion; I put these methods in util since the config package depends on the start package, and having these methods available in start for debugging is very useful.

func GetAgentStatusFromHealthcheck(baseURL string) (AgentStatus, error) {
URL := fmt.Sprintf("%s/status", baseURL)
func GetAgentStatusFromHealthcheck(baseURL string, path string) (AgentStatus, error) {
baseURL = util.ReplaceEnvString(baseURL)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to handle our default k8s use of ${env:MY_POD_IP}

@@ -119,53 +147,53 @@ func GetAgentMetricsFromEndpoint(baseURL string) (*AgentMetrics, error) {
if v.Type.String() == io_prometheus_client.MetricType_HISTOGRAM.String() {
met := v.Metric[0]
switch name := *v.Name; name {
case "otelcol_http_client_duration":
case "http_client_duration_milliseconds":
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metric names changed; possibly to be inline with prometheus conventions after the config upgrade. I checked the output from the prometheus endpoint and verified the metric names as well as watching the numbers update when calling the status command.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that the names after this update match what's collected now from our prometheus exporter

"gopkg.in/yaml.v3"
)

func PrintAllConfigsIndividually(configFilePaths []string) error {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where the config methods moved to. Again, no changes just code motion.

@obs-gh-mattcotter obs-gh-mattcotter merged commit f3f56e7 into main Apr 15, 2025
8 checks passed
@obs-gh-mattcotter obs-gh-mattcotter deleted the mc/config-fix branch April 15, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants