27 December 2021

How to find palindromes quickly

The challenge

Write a script to create a Palindromic Tree for a given string: in other words to find all the unique palindromic substrings of the string.

My solution

I found the analysis in the referenced papers a little hard to follow, but maybe that was down to having eaten too much over Christmas.  So I developed my own algorithm which at least gives the right answer.

I generated all possible substrings from the given string. So for a string of length n there are n substrings which are 1 character long, n-1 which are 2 characters long and so on up to 1 string which is n characters long. So there are n * (n + 1) / 2 substrings in all. For example, if the string is 'perl' the 10 substrings are p pe per perl e er erl r rl l.

Now I looped over these and checked (a) is it palindromic and (b) have I seen it before.  If the answer is (a) yes and (b) no, I record it as a valid item for the answer.

The slightly tricky bit is the 'is it palindromic' question, because that's going to be asked n * (n + 1) / 2 times.  My first attempt at the is_palindromic function reversed the supplied substring and compared it with the original.  That works, but isn't very fast.

But since most substrings of real words are not palindromic, I reckoned it would pay to optimise the detection of those. I did it like this:

while ($string =~ s|^(.)(.*)(.)$|$2|g) {
return 0 if $1 ne $3;
}
return 1;

The s||| construct isolates the first and last characters as $1 and $3, and the middle bit as $2.  If $1 and $3 differ, the string can't be a palindrome and we can immediately  return false.  If $1 and $3 are the same, the while loop repeats looking at what were originally the second and second last characters, and so on.  

If the original string is a palindrome and has an even number of characters, the last successful iteration of the while will result in $2 being an empty string, which will then fail to match the s||| the next time and the function will return true. If it is a palindrome and has an odd number of characters then the last $2 will be a single character which will again fail to match the s||| because $1 and $3 have to be distinct characters, and any single character is of course a palindrome.

So that's very neat in Perl and pretty efficient: compared to reversing the whole string and comparing it with the original it takes about 85% less time for my 1000 character test string.

No comments:

Post a Comment