Skip to content

Add longest_common_prefix implementation, documentation and tests#24891

Merged
steveburnett merged 1 commit into
prestodb:masterfrom
Leziak:prefix
Apr 9, 2025
Merged

Add longest_common_prefix implementation, documentation and tests#24891
steveburnett merged 1 commit into
prestodb:masterfrom
Leziak:prefix

Conversation

@Leziak

@Leziak Leziak commented Apr 8, 2025

Copy link
Copy Markdown
Contributor

Description

Add longest_common_prefix function

Motivation and Context

Feature requested by myself (😃) - I'm currently doing some data analysis at Meta which involves numerous string comparisons (i.e. string similarity). I found the levenshtein_distance built-in function really useful for this, and found that there was also a hamming_distance function. For the purposes of my data analysis, I need functions for finding out the longest common prefix, substring, suffix and ideally also the Jaro-Winkler distance. For posterity, I thought it would be really handy to have this implemented just like Levenshtein distance or the Hamming distance. So I went for it - this diff specifically contains the longest common prefix function, plus tests.

Impact

Added a longest_common_prefix function

image

Test Plan

Tested manually on TCPH catalog via presto-cli, then wrote tests which include non-ASCII characters (took direction from the Levenshtein distance function tests on this)

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

@Leziak Leziak requested review from a team, elharo and steveburnett as code owners April 8, 2025 17:50
@Leziak Leziak requested a review from jaystarshot April 8, 2025 17:50
@Leziak

Leziak commented Apr 8, 2025

Copy link
Copy Markdown
Contributor Author

This is a carbon copy of #24651, just squashed into one commit - I decided to redo the entire pull request because of a fubar situation with my local branch, sorry for the inconvenience!

@steveburnett steveburnett left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

@Leziak

Leziak commented Apr 9, 2025

Copy link
Copy Markdown
Contributor Author

How do I merge this PR? I don't see any such option despite the changes being approved and the tests passing...
image

@steveburnett

Copy link
Copy Markdown
Contributor

Merging based on an approving committer review, and my approving doc review.

@steveburnett steveburnett merged commit aac1155 into prestodb:master Apr 9, 2025
@ZacBlanco ZacBlanco mentioned this pull request May 29, 2025
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants