27 December 2021

How to find palindromes quickly

The challenge

Write a script to create a Palindromic Tree for a given string: in other words to find all the unique palindromic substrings of the string.

My solution

I found the analysis in the referenced papers a little hard to follow, but maybe that was down to having eaten too much over Christmas.  So I developed my own algorithm which at least gives the right answer.

I generated all possible substrings from the given string. So for a string of length n there are n substrings which are 1 character long, n-1 which are 2 characters long and so on up to 1 string which is n characters long. So there are n * (n + 1) / 2 substrings in all. For example, if the string is 'perl' the 10 substrings are p pe per perl e er erl r rl l.

Now I looped over these and checked (a) is it palindromic and (b) have I seen it before.  If the answer is (a) yes and (b) no, I record it as a valid item for the answer.

The slightly tricky bit is the 'is it palindromic' question, because that's going to be asked n * (n + 1) / 2 times.  My first attempt at the is_palindromic function reversed the supplied substring and compared it with the original.  That works, but isn't very fast.

But since most substrings of real words are not palindromic, I reckoned it would pay to optimise the detection of those. I did it like this:

while ($string =~ s|^(.)(.*)(.)$|$2|g) {
return 0 if $1 ne $3;
}
return 1;

The s||| construct isolates the first and last characters as $1 and $3, and the middle bit as $2.  If $1 and $3 differ, the string can't be a palindrome and we can immediately  return false.  If $1 and $3 are the same, the while loop repeats looking at what were originally the second and second last characters, and so on.  

If the original string is a palindrome and has an even number of characters, the last successful iteration of the while will result in $2 being an empty string, which will then fail to match the s||| the next time and the function will return true. If it is a palindrome and has an odd number of characters then the last $2 will be a single character which will again fail to match the s||| because $1 and $3 have to be distinct characters, and any single character is of course a palindrome.

So that's very neat in Perl and pretty efficient: compared to reversing the whole string and comparing it with the original it takes about 85% less time for my 1000 character test string.

20 December 2021

Illuminating the Ulam

Perl Weekly Challenge 144, task 2 states:

The challenge

Write a script to generate Ulam Sequence having at least 10 Ulam numbers where $u and $v are the first 2 Ulam numbers.

The first two numbers in an Ulam sequence are the lesser (U1) and the greater (U2) of $u and $v.

Then for n > 2, Un is defined to be the smallest integer that is the sum of two distinct earlier terms in exactly one way, and larger than all earlier terms.

The solution

So how do we go about this?  If we know U1 to Un, how do we calculate Un+1?

The initial thought is that we need to find the sums of all possible pairs, Uj and Uk, chosen from U1 to Un. Clearly, there are n(n - 1) such pairs, and we are looking for the one that:

- exceeds Un,
- is unique (ie the sum is not the same as any of the other sums),
- is the lowest such number.

We can cut the number of pairs to consider by half by insisting that Uj is less than Uk.  So our nested loops look some thing like:

for j = 1 to n - 1 {
    for k = j + 1 to n {
        consider Uj + Uk
    }
}

Note that we have also eliminated the cases where j = k, as required.  But what does 'consider' need to do?

The easy first step is to discard any Uj + Uk that is less than Un.  We can then can consider whether this Uj + Uk is unique, so we need to keep a tally of the times that sum has come up, and I think the easiest way to do that is simply to use an array, say 'times', and to increment times[Uj + Uk] for each pair of values. 

By the end of our nested loops, then, we have a sparse array 'times', with the populated elements being the number of times we have made that index from a pair of the values of U1 to Un.  So if 10 has come up twice - say 6 + 4 and 3 + 7, times[10] will be 2, and if 11 has come up only once, say 5 + 6, then times[11]  will be 1.

Now all we have to do is to search up from times[Un + 1] for the first element with the value 1, and assign that to U[n + 1]. 

The time to do this increases exponentially with the number of elements. On my (quite slow) Raspberry Pi computer I can do 100 elements in milliseconds, but 500 takes 4 seconds and 1000 takes 37 seconds.

14 December 2021

Being stealthy is a fourfold property

PWC 143 task 2 involves stealthy numbers.

A positive integer n is stealthy, if there exist positive integers a, b, c, d such that a * b = c * d = n and a + b = c + d + 1.

The stealthy numbers less than 100 are: 4, 12, 24, 36, 40, 60, 72 and 84.  Looking further ahead it appears likely that there is an infinite number of such numbers.  

Some stealthy numbers have more than one solution for a + b = c + d + 1, for example:

n = 7200
72 + 100 = 75 + 96 + 1
75 + 96   = 80 + 90 + 1

n = 2520
35 + 72 = 36 + 70 + 1
40 + 63 = 42 + 60 + 1
42 + 60 = 45 + 56 + 1

All stealthy numbers are multiples of 4. The reason is this:

Consider a + b = c + d + 1.

Clearly either a + b or c + d must be an odd number, since they differ by 1. If a is odd, b must be even (so that a + b is odd), and c and d must either be both odd or both even (so that c + d is even).

But we can rule out c + d being both odd, because c * d would then be odd and a * b would be even, so they cannot both equal n as is required.

So, of a, b, c and d, one must be odd and the other 3 must be even.  We know that n = a * b = c * d, and either a * b or c * d is the product of two even numbers - two multiples of 2 - so n must be a multiple of 4.


08 December 2021

Clarity versus brevity

PWC 142 task 1: You are given positive integers, $m and $n. Write a script to find total count of divisors of $m having last digit $n.

Tasks such as this always raise a dilemma.  Do I write something ingenious and compact, or should I write code which someone else (or me in 10 years) can easily understand.

My submission falls in the second category. I simply loop $j from 1 to $m, checking whether $m / $j is an integer, in which case $j is a divisor. I add that to my list of divisors, and if it ends in $n, to my list of divisors ending in $n.

This is - in once sense - quite inefficient, because once $j exceeds $m/2 the only remaining divisor is $m itself so I've wasted more-or-less half my time.

I could get round that by looping $j from 1 to int($m/2), and every time I found a divisor $d I could add $d and $m/$d to my list of divisors. But then I would have to sort the list, or possibly make two lists, reverse one of them and concatenate, remembering that if $m is even, $m/2 will be in my list twice.

Even faster would be to check only for prime divisors, and then there would be quite a limited set of combinations of those which would need to be tested for last-digit matching.

So I went with the simple version, and even my little Raspberry Pi easily deals with, for example, 1048575 in a couple of seconds.

No doubt someone - probably many people - will have come up with a more elegant solution, and of course if my solution had been very slow I would have done that too.

One of the failings of Perl code which is often mentioned is its maintainability.  I've been at both ends of that - writing code which I know will need to be maintained to meet changing business needs long after I'm gone, and being tasked with maintaining code that some long-gone person wrote.

I know which approach I prefer.


PWC 142 task 2: Write a script to implement SleepSort.

This is of course a bit of a joke, but you can do it in Perl, of course.  I found that delaying $n seconds worked, but much less, for example $n/10 seconds did not: the Linux scheduler clearly doesn't use a fifo queue.

There is a slightly less jokey way of sorting in Perl without actually sorting:

@list = (1, 9, 76, 3, 99, 4, 22);

for $j  (@list) {
$sorted[$j] = 1;
$max = $j if $j > $max;
}

for $j  (0 .. $max) {
say qq[$j] if $sorted[$j];
}



01 December 2021

Some prime thoughts on divisors

Perl Weekly Challenge 141, task 1, reads:

Write a script to find lowest 10 positive integers having exactly 8 divisors.

The solution to this is easy enough in that all 10 numbers are less than 100.

But what happens if we vary the number of divisors: find the lowest integer (k) having exactly n divisors?  The answers are as follows:

  • n=>k
  • 2 => 2 (1, 2)
  • 3 => 4   (1, 2, 4)
  • 4 => 6  (1, 2, 3, 6)
  • 5 => 16  (1, 2, 4, 8, 16)
  • 6 => 12  (1, 2, 3, 4, 6, 12)
  • 7 => 64  (1, 2, 4, 8, 16, 32, 64)
  • 8 => 24  (1, 2, 3, 4, 6, 8, 12, 24)
  • 9 => 36  ( 1, 2, 3, 4, 6, 9, 12, 18, 36)
  • 10 => 48  (1, 2, 3, 4, 6, 8, 12, 16, 24, 48)
  • 11 => 1024  (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024)
  • 12 => 60  (1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, 60)
  • 13 => 4096  (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096)
  • 14 => 192  (1, 2, 3, 4, 6, 8, 12, 16, 24, 32, 48, 64, 96, 192)
  • 15 => 144 (1, 2, 3, 4, 6, 8, 9, 12, 16, 18, 24, 36, 48, 72, 144)
  • 16 => 120 (1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, 60, 120)
    17 => 65536  (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536)
  • 18 => 180 (1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 30, 36, 45, 60, 90, 180)
  • 19 => 262144  (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144)
  • 20 => 240 (1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 16, 20, 24, 30, 40, 48, 60, 80, 120, 240)


So it appears that for non-prime n, the smallest integer having n divisors is a reasonably small number.  But for primes, the smallest such integer appears always to be 2**(n-1), which is usually a much larger number than the surrounding non-primes.

So:
3 => 4 = 2**2
5 => 16 = 2**4
7 => 64 = 2**6
11 => 1024 = 2**10
13 => 4096 = 2**12
17 => 65536 = 2**16
19 => 262144 = 2**18

It is also the case that the second-smallest prime integer having n divisors is 3**(n-1) and the third lowest is 5**(n-1) and so on.

It is clear that a power of 2 will always be divisible by any other power of 2 up to and including itself, and that it will have no other divisors. So it's understandable, for example, that 262144 = 2**18 has 19 divisors - 2**0 up to 2**18. 

But why does no smaller number also have 19 divisors? I have not been able to come up with a convincing explanation.