Add longest_common_prefix implementation, documentation and tests#24891
Merged
Conversation
Contributor
Author
|
This is a carbon copy of #24651, just squashed into one commit - I decided to redo the entire pull request because of a fubar situation with my local branch, sorry for the inconvenience! |
steveburnett
approved these changes
Apr 8, 2025
steveburnett
left a comment
Contributor
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
jaystarshot
approved these changes
Apr 9, 2025
Contributor
Author
Contributor
|
Merging based on an approving committer review, and my approving doc review. |
13 tasks
21 tasks
This was referenced May 30, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Add longest_common_prefix function
Motivation and Context
Feature requested by myself (😃) - I'm currently doing some data analysis at Meta which involves numerous string comparisons (i.e. string similarity). I found the levenshtein_distance built-in function really useful for this, and found that there was also a hamming_distance function. For the purposes of my data analysis, I need functions for finding out the longest common prefix, substring, suffix and ideally also the Jaro-Winkler distance. For posterity, I thought it would be really handy to have this implemented just like Levenshtein distance or the Hamming distance. So I went for it - this diff specifically contains the longest common prefix function, plus tests.
Impact
Added a longest_common_prefix function
Test Plan
Tested manually on TCPH catalog via presto-cli, then wrote tests which include non-ASCII characters (took direction from the Levenshtein distance function tests on this)
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.