Skip to content

Bug: Data race in PrometheusWriter gauges map #1194

@abhijeetsharma200

Description

@abhijeetsharma200

Describe the bug

I detected a race condition in the PrometheusWriter sink involving the gauges map. The DefineMetrics method, which is called during configuration reloads, initializes and writes to the gauges map without holding a lock. Concurrently, the Collect method (invoked by Prometheus scrapes) calls MetricStoreMessageToPromMetrics, which reads from the same gauges map. This concurrent read/write access triggers a data race

To Reproduce

Run the following test with the race detector enabled (go test -race):

func TestGaugesMap_RaceCondition(_ *testing.T) {
	// 1. Initialize PrometheusWriter
	promw, _ := NewPrometheusWriter(testutil.TestContext, "127.0.0.1:0/pgwatch")

	// 2. Register a metric so Write() actually puts data into the map
	_ = promw.SyncMetric("race_db", "test_metric", AddOp)

	// 3. Pre-fill cache so Collect has something to do
	_ = promw.Write(metrics.MeasurementEnvelope{
		DBName:     "race_db",
		MetricName: "test_metric",
		Data: metrics.Measurements{
			{
				metrics.EpochColumnName: time.Now().UnixNano(),
				"value":                 int64(100),
			},
		},
	})

	var wg sync.WaitGroup
	done := make(chan struct{})

	// --- The Config Reloader (Simulating configuration updates) ---
	wg.Go(func() {
		for {
			select {
			case <-done:
				return
			default:
				// Call the REAL DefineMetrics method (Writes to gauges map)
				_ = promw.DefineMetrics(&metrics.Metrics{
					MetricDefs: metrics.MetricDefs{
						"test_metric": {Gauges: []string{"value"}},
					},
				})
			}
		}
	})

	// --- The Collector (Simulating Prometheus Scrapes) ---
	wg.Go(func() {
		// Prometheus provides a channel to receive metrics
		ch := make(chan prometheus.Metric, 10000)

		// Scrape 50 times (more than enough to trigger a race in a tight loop)
		for range 50 {
			// Call the REAL Collect method (Reads from gauges map)
			promw.Collect(ch)

			// Drain the channel so it doesn't block
		drainLoop:
			for {
				select {
				case <-ch:
				default:
					break drainLoop
				}
			}
		}
		close(done) // Tell the reloader to stop
	})

	wg.Wait()
}

Expected behavior

Concurrent calls to DefineMetrics() and Collect() should be thread-safe. Access to the gauges map should be protected by a mutex.

Actual behavior

WARNING: DATA RACE
Read at 0x00c00016f5c8 by goroutine 31:
  command-line-arguments.(*PrometheusWriter).MetricStoreMessageToPromMetrics()
      internal/sinks/prometheus.go:206

Previous write at 0x00c00016f5c8 by goroutine 30:
  command-line-arguments.(*PrometheusWriter).DefineMetrics()
      internal/sinks/prometheus.go:102

testing.go:1617: race detected during execution of test

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions