26 April 2022

Checksums and early encryption

ISBN-13

The ISBN checksum, or indeed any checksum, is there to provide an easy check for the plausibility of a multi-digit sequence.  The same sort of technique is used for credit card numbers.

The idea is that the commonest errors are (1) the alteration of a single digit or (2) the swapping of 2 consecutive digits, and both of these will usually change the check digit. 

The check digit can obviously take only one of 10 different values and thus there is a 10% chance that a wrong set of digits will give the same checksum. However, by choosing different multipliers for consecutive digits the probability that swapping two consecutive digits yields the same checksum is much reduced - at the expense, of course, of slightly increasing the chance that a random error will result in an unchanged check digit.

The algorithm is pretty straightforward to code, and even in my rather wordy style is only 5 lines long.

Wheatstone-Playfair encryption

This is an interesting example of something which is relatively easy to describe, but requires rather more code than might at first appear.

My solution builds the 5x5 matrix as a 2D array, and simultaneously builds an index to it such that $locate{$letter}->[0] and $locate{$letter}->[1] give the coordinates of the given letter. Once that is done, the encrypt and decrypt functions are easily coded.

As one of the examples I used 'the quick brown fox ...' as the key to show that my technique works even if all the letters of the alphabet occur in the key, which is required if the bottom right cell of the matrix is not to be 'z'.

18 April 2022

Abecedarian words and pangrams

Abecedarian words   ...

... are words whose letters are in alphabetical order. There are, maybe surprisingly, few of them: from the supplied dictionary of 39172 words only 339 (0.86%) are abecedarian, and none is more than 6 letters long.

But if we consider any pair of random letters, c1 and c2, the probability that c1 alphabetically precedes c2 and the probability that c2 precedes c1 are clearly the same. The only other option is that they are the same letter, and the probability of that is 1 in 26. So the probability that c2 is the same as or comes alphabetically after c1 is 13.5 in 26, which is close to 52% - not bad odds you might think, and indeed there quite a few two-letter words in the list (some, it has to be said, are dubiously valid as words).

Now you might think that a 3-letter word would have a 52% x 52% = 27% chance of being abecedarian.  Actually, it's less than that because if you think of a 3-letter word w1w2w3 where w1w2 is abecedarian, w2 is therefore in the range w1-Z whereas a randomly selected w3 is in A-Z. So w3 has a less than 52% chance of making w1w2w3 abecedarian.

How much less I leave to those with younger brains than me, and of course these theoretical numbers make the demonstrably false assumption that words consist of random letters and that every letter is equally likely to appear in a word, but the fact that there are few abecedarian words of more than 6 letters is quite explicable.

Pangrams

The approach I used to creating pangrams was to choose a random word from the dictionary as my pangram's first word.  I then chose more random words from the dictionary, and added each of them to my pangram if they contained a letter I didn't already have. And once I had all 26 letters I stopped.

This works well, and my submission delivers a sample of 10 results.

I then wondered how close I could get to the shortest (fewest words or letters). I ran through my algorithm 10 000 times and it's clear that it can get down to 8 words and 60ish letters. The best I did after just a few tries is:

tributary shipwrecked mollifying gavels gazes frequents jerky exhaling
8 words, 63 letters

The quick brown fox pangram has 9 words but only 35 letters, so I thought a better algorithm might be one that prefers shorter words. Limiting the word length to 5 gave:

rowdy clix bang newt quick huffs zest ohm video opera jell
11 words, 48 letters

I think that's not bad for a fairly simplistic algorithm (though I'm a little dubious about clix.)


14 April 2022

Four is magic and Equilibrium Indices

Four is magic

Not much to say about this, really. Someone will have a one-line answer, but as usual I prefer readability to conciseness.

It's a little hard to see what practical use this might be!

Equilibrium indices

If we were guaranteed that the numbers were sorted in ascending order we could do a binary chop to get to the answer, but we're not, so I can see little alternative to starting at the second member of the array and working through to the second last.  It's the second, because the statement of the problem suggest that the two subarrays are not null.

I suppose you save a few nanoseconds by doing something like this:

$left_sum = $array[0];
$right_sum = sum of $array[2] to the end;

and then loop over the array elements, adding the element to $left_sum, and subtracting it from $right_sum, but really, that's a bit messy and for any reasonable array it won't take long just to recreate $left_sum and $right_sum each time. So that's what I did.

I noted that It doesn't say positive or non-zero integers, so we'd better allow for zero and negative ones, which means that there may not be a unique answer (eg 4, 5, 0, 0, 4, 5 or -3, 3, 0, -4, 4, 0, -5, 5).