The challenge
Write a script to create a Palindromic Tree for a given string: in other words to find all the unique palindromic substrings of the string.
My solution
I found the analysis in the referenced papers a little hard to follow, but maybe that was down to having eaten too much over Christmas. So I developed my own algorithm which at least gives the right answer.
I generated all possible substrings from the given string. So for a string of length n there are n substrings which are 1 character long, n-1 which are 2 characters long and so on up to 1 string which is n characters long. So there are n * (n + 1) / 2 substrings in all. For example, if the string is 'perl' the 10 substrings are p pe per perl e er erl r rl l.
Now I looped over these and checked (a) is it palindromic and (b) have I seen it before. If the answer is (a) yes and (b) no, I record it as a valid item for the answer.
The slightly tricky bit is the 'is it palindromic' question, because that's going to be asked n * (n + 1) / 2 times. My first attempt at the is_palindromic function reversed the supplied substring and compared it with the original. That works, but isn't very fast.
But since most substrings of real words are not palindromic, I reckoned it would pay to optimise the detection of those. I did it like this:
while ($string =~ s|^(.)(.*)(.)$|$2|g) {
return 0 if $1 ne $3;
}
return 1;
The s||| construct isolates the first and last characters as $1 and $3, and the middle bit as $2. If $1 and $3 differ, the string can't be a palindrome and we can immediately return false. If $1 and $3 are the same, the while loop repeats looking at what were originally the second and second last characters, and so on.
If the original string is a palindrome and has an even number of characters, the last successful iteration of the while will result in $2 being an empty string, which will then fail to match the s||| the next time and the function will return true. If it is a palindrome and has an odd number of characters then the last $2 will be a single character which will again fail to match the s||| because $1 and $3 have to be distinct characters, and any single character is of course a palindrome.
So that's very neat in Perl and pretty efficient: compared to reversing the whole string and comparing it with the original it takes about 85% less time for my 1000 character test string.
No comments:
Post a Comment