31 October 2022

The smallest greater and the shortest slice

The smallest greater

Our first task this week is as follows: You are given an array of characters (a..z) and a target character. Write a script to find out the smallest character in the given array lexicographically greater than the target character.

The best way to approach this seems to me to sort the array, and then look for the first character in the sorted array that follows the target character alphabetically. As the array is sorted this is bound to be the smallest such character.

This works for all of Mohammad's examples except the last, where the array is 't, g, a, l' and the target character is 'v'.  So I would say there is no answer, though Mohammad's example says the answer is 'v', which isn't in the array.

And the shortest slice

The second task says that we are given an array of two or more non-negative integers and are asked to write a script to find out the smallest slice of the array having the same degree as the array itself. We are told that the degree of an array is the maximum frequency of an element in the array.

It seems sensible to start by writing a subroutine to return the degree of an array.

We know that the slice can't be any shorter than the degree of the original array, so that's a good start. Let's begin at that size and work up, and then in an inner loop check every slice of that size, and then the next size up and so on until we find a slice with the same degree as the original array.

And that's what I did. There is a slight wrinkle, in that the answer may not be unique. For example 1, 1, 1, 2, 2, 2 has two slices - 1, 1, 1 and 2, 2, 2 - each of degree 3.  I chose to output all the slices meeting the criteria, so my inner (position) loop does not immediately terminate if it finds a solution, but it does stop the outer (size) loop from going further.

27 October 2022

Divisible pairs and reduce to nothing

Divisible pairs

Task 1 supplies us with a list of integers and asks us to count the pairs selected from the list whose sums are divisible by a supplied integer.

The obvious way to do this is to generate all the pairs and check their divisibility by the given divisor. All the examples given include only unique positive integers. The task is not significantly different if zero and negative integers are included, but if the same integer appears more than once (eg list = 1, 2, 3, 3, divisor = 4) do we count distinct pairings (1 and the first 3, 1 and the second 3)  or merely distinct pairs (just 1 and 3).

Reduce to nothing

Task 2 as stated is impossible. Applying the supplied rules to any pair of starting x and y will never result in both x and y reaching zero, because:

  • Each operation reduces either x and y and leaves the other unchanged
  • So the values before the last operation would have to be (0, n) or (n, 0) where n > 0
  • But that isn't possible because subtracting 0 from n cannot result in 0.
Inspection of the examples suggests that what was intended is that the process finishes when either of x and y reaches zero, and indeed if the rules are applied in the order specified, it will be x that does so.

But is it the case that all starting values of x and y will result in x == 0?  The answer is yes, because:
  • After each operation, x + y always reduces, but cannot become negative
  • There cannot therefore be more than x + y - 1 steps, even if each step only reduces x + y by 1
  • Unless x or y is zero, it is always possible to make another step
The maximum number of steps (x + y - 1) will be required if the starting value of x or y is 1 and the minimum number (1) if the starting values of x and y are equal.

19 October 2022

Foo meets Bar and the wizardly trio

Will Foo and Bar meet?

Once upon a time, Foo and Bar went on holiday, separately, to the same city. We know the day each of them arrived and left, and are asked how many days they had fun together.

When I was a lad, the IBM System/360 series computers were the latest thing. Amazingly, they were the first machines I used that had access to the current date and time, and equally amazingly, that data wasn't accessible to Fortran code, though I think it could be retrieved in Cobol. 



I prevailed on someone to write a Fortran-accessible subroutine in assembler, and annoyed the computer staff having it on binary coded punched cards which only their newest card reader could cope with.

But I digress. The date on these machines was held as a 5 digit number such as 65365, which meant the 365th day of 1965, ie 31 December 1965. And annoying though that was at the time, it comes in useful this week. We're told that our pair's holidays both started and ended in the same year, so let's convert their start and end dates to days since the start of the year numbers, ie 1 (1 January) to 365 (31 December).

Now we can easily see that the overlap is the later of their start dates to the earlier of their end dates, and its duration is simply the difference between these two day numbers, plus 1 - and if that is less than zero then - bad luck, Foo and Bar.

Magical triplets

In task 2, we are asked to write a script to find the triplet (a, b, c) from a given list that satisfies the following rules.

1. a + b > c
2. b + c > a
3. a + c > b
4. a + b + c is maximum.
5. If we end up with more than one triplet having the maximum then we pick the triplet where a >= b >= c.

My first approach to this was to use Algorithm::Combinatorics to get all the possible triplets, filter out the ones that satisfy rules 1, 2 and 3, and find which of these maximises a + b + c.  In my submission that is Method 1 and it certainly works.

However, there is an easier way - my Method 2:

  1. Sort the supplied list into reverse order
  2. Suppose the first three items are now m, n and o.
  3. If o is greater than m - n, then we have the solution - output it and stop
  4. If not, discard the first item in the list and if there are still at least 3 items in the list go back to step 2.
  5. If you get here, there is no solution.

Why does this work?

In a reverse sorted list, the first three numbers are the three largest in the list, and clearly their sum is the largest possible triplet sum, so that's rule 5 taken care of.

For any three reverse sorted numbers m, n and o, rules 1 and 3 will always be satisfied because any sum involving m is already larger than o, so we only need to do test 2. If that succeeds, we have the solution, because no subsequently found triplet will have a larger sum.

If it fails, then no solution can contain m, because if n + o doesn't exceed m then no other pair will. So we can discard m and work on the next triplet (n, o and p) in the same way.


10 October 2022

Merge like a zip and Unidecode

Merge like a zip


Travelling in New Zealand we noticed that when two lanes of traffic merge into one, the road sign says 'Merge like a zip'. And that's what's needed here.

We are given two lists @a and @b of same length and asked to create a subroutine zip(@a, @b) that returns a zipped list, eg 1,2,3 + a,b,c = 1,a,2,b,3,c.

The first thing to remember is that Perl passes subroutine arguments as a single array of scalars.  We are assured that @a and @b have the same number of elements, so within zip, @_ is the concatenation of @a and @b.  If $n is the number of elements in @a (or @b) we then simply have to return an array of 

$_[0], $[0 + $n], $_[1], $_[1 + $n], ... $_[$n - 1], $[2 * $n - 1]

which is an easy one-liner.

Note though that doing it one line means using $_[$_], which requires a moment's thought to remember that the sub arguments are in the array @_ and the implied iterator in a 'for' command modifier is the scalar $_.

Unidecode

We are given a string with characters which are Latin alphabet characters (a-zA-Z) with diacritic marks, such as Ã or ô. We are to create a subroutine makeover($str) that replaces these characters with the unmarked equivalents.

This is an interesting challenge in that there is no way (that I know) of investigating the shape of a character and identifying that Ã is actually represented in print as A with a tilde above it.

One possibility would be to go through the Unicode code pages and manually create a translation, eg $plain{'Ã'} = 'A'. But that would be painful, because aside from the ones we probably know about from French and German, there are dozens more that exist.

Fortunately:

  • they all have Unicode names starting with LATIN CAPITAL LETTER x or LATIN SMALL LETTER x, and
  • there is (of course!) a Perl module which will return the name for a given character.
So here are the guts of what we need:

$name = charnames::viacode(ord($char));
$result = '';

# check if it is a modified latin letter
if ($name =~ m|LATIN CAPITAL LETTER (.)|) {
   $result .= $1;

} elsif ($name =~ m|LATIN SMALL LETTER (.)|) {
   $result .= lc($1);

# or if not just copy it to output
} else {
   $result .= $char;
}




03 October 2022

Manipulating characters

Reformatting MAC addresses

We are asked to take a MAC address stated as 3 dot-separated groups of 4 characters, and to reformat it as 6 colon-separated groups of 2 characters.  For example:

1234.5678.9abc -> 12:34:56:78:9a:bc

I'm not great believer in one-liners as they are often hard to understand, but in this case I feel that

say $test =~ m|^(..)(..).(..)(..).(..)(..)$| ? 
    qq[\nInput:  $test\nOutput: $1:$2:$3:$4:$5:$6]: 'Invalid format';

does the trick, including formatting the output as Mohammad specifies.

It could equally be done using substr() which we're told is faster, but it's hard to think of a real-life use case where it would make a difference.

Masking characters

Task 2 is stated as 'You are given a list of codes in any random format. Write a script to mask the first four characters (a-z,0-9) and keep the rest as it is.'  

Examination of the examples suggests that we are given an arbitrary string of characters, and are required to change the first 4 instances of [a-z0-9] to 'x'.

Again a solution using a regular expression can do the work in a single line:

s|^(.*?)[a-z0-9](.*?)[a-z0-9](.*?)[a-z0-9](.*?)[a-z0-9]|$1x$2x$3x$4x|

That's a little hard to read, but essentially it breaks the string into the four characters of interest and the (possibly empty) substrings in between them, and then joins the latter up with 'x's.

For this to work I am taking the hard line that any valid input to the task has at least 4 characters matching [a-z0-9]. If that assertion fails, then I think the single regular expression has to be replaced with a loop over 4 attempts to replace one character with 'x'.

But what if one of the first four [a-z0-9] characters in the string is already 'x'? Are we to ignore it or process it:

axbcdef -> xxxxxef or axbcdef -> xxxxdef ?

The first of these fails the task criteria because it has masked the fifth [a-z0-9] character in violation of 'mask the first four and keep the rest'. The second of these fails the criteria because the second character in the string is 'x' before and after and is therefore not masked, violating 'mask the first four'.