10 October 2022

Merge like a zip and Unidecode

Merge like a zip


Travelling in New Zealand we noticed that when two lanes of traffic merge into one, the road sign says 'Merge like a zip'. And that's what's needed here.

We are given two lists @a and @b of same length and asked to create a subroutine zip(@a, @b) that returns a zipped list, eg 1,2,3 + a,b,c = 1,a,2,b,3,c.

The first thing to remember is that Perl passes subroutine arguments as a single array of scalars.  We are assured that @a and @b have the same number of elements, so within zip, @_ is the concatenation of @a and @b.  If $n is the number of elements in @a (or @b) we then simply have to return an array of 

$_[0], $[0 + $n], $_[1], $_[1 + $n], ... $_[$n - 1], $[2 * $n - 1]

which is an easy one-liner.

Note though that doing it one line means using $_[$_], which requires a moment's thought to remember that the sub arguments are in the array @_ and the implied iterator in a 'for' command modifier is the scalar $_.

Unidecode

We are given a string with characters which are Latin alphabet characters (a-zA-Z) with diacritic marks, such as Ã or ô. We are to create a subroutine makeover($str) that replaces these characters with the unmarked equivalents.

This is an interesting challenge in that there is no way (that I know) of investigating the shape of a character and identifying that Ã is actually represented in print as A with a tilde above it.

One possibility would be to go through the Unicode code pages and manually create a translation, eg $plain{'Ã'} = 'A'. But that would be painful, because aside from the ones we probably know about from French and German, there are dozens more that exist.

Fortunately:

  • they all have Unicode names starting with LATIN CAPITAL LETTER x or LATIN SMALL LETTER x, and
  • there is (of course!) a Perl module which will return the name for a given character.
So here are the guts of what we need:

$name = charnames::viacode(ord($char));
$result = '';

# check if it is a modified latin letter
if ($name =~ m|LATIN CAPITAL LETTER (.)|) {
   $result .= $1;

} elsif ($name =~ m|LATIN SMALL LETTER (.)|) {
   $result .= lc($1);

# or if not just copy it to output
} else {
   $result .= $char;
}




No comments:

Post a Comment