Skip to content

feat(spans): Scrub random strings in resource spans#2614

Merged
jjbayer merged 17 commits intomasterfrom
feat/spans-resource-random-strings
Oct 19, 2023
Merged

feat(spans): Scrub random strings in resource spans#2614
jjbayer merged 17 commits intomasterfrom
feat/spans-resource-random-strings

Conversation

@jjbayer
Copy link
Copy Markdown
Member

@jjbayer jjbayer commented Oct 17, 2023

These PR attempts to sanitize some flaws in resource span scrubbing:

  1. chrome-extension:// domains are random strings, scrub those.
  2. Keep the schema, but scrub subdomains (instead of cdn.domain.com, write https://*.domain.com).
  3. Replace path segments with special characters (=, %, ...) with *
  4. If a path segment has more than 25 characters after regex scrubbing, assume it is an identifier and replace with *.
  5. If a path segment has only alphabetic characters and contains uppercase characters, assume it is an identifier and replace with *.

See test cases for examples.

@jjbayer jjbayer changed the base branch from master to ref/spans-no-clustering October 17, 2023 16:19
Base automatically changed from ref/spans-no-clustering to master October 18, 2023 07:45
@jjbayer jjbayer marked this pull request as ready for review October 19, 2023 08:15
@jjbayer jjbayer requested review from a team, DominikB2014 and phacops October 19, 2023 08:15
Co-authored-by: Oleksandr <1931331+olksdr@users.noreply.github.com>
resource_script_random_path_only,
"/ERs-sUsu3/wd4/LyMTWg/Ot1Om4m8cu3p7a/QkJWAQ/FSYL/GBlxb3kB",
"resource.script",
"/*/*/*/*/*/*/*"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this string valuable?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but the complexity of this PR already ballooned, so I would like to keep it as-is. If it turns out we produce high cardinality because of variable length /*, /*/*, ..., we can always reconsider.

Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>
@jjbayer jjbayer enabled auto-merge (squash) October 19, 2023 12:59
@jjbayer jjbayer merged commit 0602533 into master Oct 19, 2023
@jjbayer jjbayer deleted the feat/spans-resource-random-strings branch October 19, 2023 14:02
@DominikB2014
Copy link
Copy Markdown
Contributor

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants