Skip to content

Conversation

@rootranjan
Copy link

Fixes #4631

Description:

Reduce false positives in DatadogToken detector by filtering out legitimate code identifiers, checksums, encrypted data, and test values that match the detector pattern.

Changes:

  • Add filter to exclude letters-only matches (no digits)
  • Add filter to exclude repeated characters (test/placeholder values)
  • Add filter to exclude NPM integrity hashes (sha512-...== patterns)
  • Add filter to exclude Go module checksums (h1:...= patterns)
  • Add filter to exclude URL-encoded paths (%3A patterns)
  • Add filter to exclude SOPS-encrypted data (ENC[AES256_GCM,data:...] patterns)
  • Add filter to exclude base64-encoded certificates (caBundle patterns)
  • Fix lint errors by properly handling res.Body.Close() errors

This reduces false positives from legitimate code identifiers, checksums, encrypted data, and test values while still detecting real Datadog API and Application keys that contain digits and have higher entropy.

Problem:
The DatadogToken detector was flagging any 32-character or 40-character alphanumeric string near the keywords "datadog" or "dd" as a potential secret, including:

  • URL-encoded service names in paths (e.g., service%3Amy-app-service-name)
  • NPM package integrity hashes (e.g., substrings from sha512-...== patterns)
  • Go module checksums (e.g., substrings from h1:...= patterns)
  • SOPS-encrypted data (e.g., substrings from ENC[AES256_GCM,data:...] patterns)
  • Test/placeholder values (e.g., 11111111111111111111111111111111)
  • Base64-encoded certificates (e.g., substrings from caBundle fields)

Solution:
Added isLikelyFalsePositive() helper function with multiple filters:

  1. Letters-only filter - Excludes strings with no digits (service names/identifiers)
  2. Repeated characters filter - Excludes test/placeholder values like 11111111111111111111111111111111
  3. NPM integrity hash filter - Detects sha512-...== patterns in package.json files
  4. Go module checksum filter - Detects h1:...= patterns in go.sum/go.mod files
  5. URL-encoded path filter - Detects %3A patterns and URL structures
  6. SOPS-encrypted data filter - Detects ENC[AES256_GCM,data:...] patterns
  7. Base64 certificate filter - Detects caBundle and certificate-related fields

Implementation Details:

  • Modified FromData() to use FindAllStringSubmatchIndex() to get match positions for context extraction
  • Added context-aware filtering that checks surrounding text (±200 chars for most patterns, ±2000 chars for certificates) to detect patterns
  • Filters are applied before processing matches to avoid unnecessary verification calls
  • Each filter function extracts context around the match and checks for specific patterns (e.g., sha512-, h1:, %3A, ENC[, caBundle)

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Add filters to exclude legitimate code patterns:
- Letters-only matches (service names/identifiers)
- Repeated characters (test/placeholder values)
- NPM integrity hashes (sha512-...== patterns)
- Go module checksums (h1:...= patterns)
- URL-encoded paths (%3A patterns)
- SOPS-encrypted data (ENC[AES256_GCM,data:...] patterns)
- Base64-encoded certificates (caBundle patterns)

This reduces false positives while still detecting real Datadog API and Application keys.
@rootranjan rootranjan requested a review from a team December 31, 2025 13:49
@rootranjan rootranjan requested a review from a team as a code owner December 31, 2025 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DatadogToken detector produces false positives for checksums, encrypted data, and service names

1 participant