18 April 2022

Abecedarian words and pangrams

Abecedarian words   ...

... are words whose letters are in alphabetical order. There are, maybe surprisingly, few of them: from the supplied dictionary of 39172 words only 339 (0.86%) are abecedarian, and none is more than 6 letters long.

But if we consider any pair of random letters, c1 and c2, the probability that c1 alphabetically precedes c2 and the probability that c2 precedes c1 are clearly the same. The only other option is that they are the same letter, and the probability of that is 1 in 26. So the probability that c2 is the same as or comes alphabetically after c1 is 13.5 in 26, which is close to 52% - not bad odds you might think, and indeed there quite a few two-letter words in the list (some, it has to be said, are dubiously valid as words).

Now you might think that a 3-letter word would have a 52% x 52% = 27% chance of being abecedarian.  Actually, it's less than that because if you think of a 3-letter word w1w2w3 where w1w2 is abecedarian, w2 is therefore in the range w1-Z whereas a randomly selected w3 is in A-Z. So w3 has a less than 52% chance of making w1w2w3 abecedarian.

How much less I leave to those with younger brains than me, and of course these theoretical numbers make the demonstrably false assumption that words consist of random letters and that every letter is equally likely to appear in a word, but the fact that there are few abecedarian words of more than 6 letters is quite explicable.

Pangrams

The approach I used to creating pangrams was to choose a random word from the dictionary as my pangram's first word.  I then chose more random words from the dictionary, and added each of them to my pangram if they contained a letter I didn't already have. And once I had all 26 letters I stopped.

This works well, and my submission delivers a sample of 10 results.

I then wondered how close I could get to the shortest (fewest words or letters). I ran through my algorithm 10 000 times and it's clear that it can get down to 8 words and 60ish letters. The best I did after just a few tries is:

tributary shipwrecked mollifying gavels gazes frequents jerky exhaling
8 words, 63 letters

The quick brown fox pangram has 9 words but only 35 letters, so I thought a better algorithm might be one that prefers shorter words. Limiting the word length to 5 gave:

rowdy clix bang newt quick huffs zest ohm video opera jell
11 words, 48 letters

I think that's not bad for a fairly simplistic algorithm (though I'm a little dubious about clix.)


No comments:

Post a Comment