2008 Winter Scripting Games

Jan Dubois' Solution to Advanced Perl Event 2: Skating on Thin Ice

Event 1 Solution

About the Author

Jan Dubois is the tech lead for all Perl technologies at ActiveState. He is a member of the core Perl5-Porters group that maintains the Perl language implementation. Jan wrote the Win32::OLE Automation support for Perl while he was still working in the financial industry. He has a Masters degree in Physics from the University of Hamburg, Germany. Jan now lives in Vancouver in Beautiful British Columbia, home of the 2010 Winter Olympics.

*

Skating on Thin Ice


I've just arrived now at the skating rink, where the second event is already in full swing.

The challenge

In this event we need to read the “skaters.txt” file containing the names and judging scores of all competitors. The lowest and highest score for each athlete will be dropped, and the average of the remaining scores is the final result for each skater. We then award Gold, Silver, and Bronze medals to the athletes with the 3 highest scores.

How we are going to solve it

We are going to read the results file line-by-line, computing the final scores as we go and storing the scores in a hash table keyed by the skater's name. Once we've read the whole file we sort the skaters by score and use the highest 3 scores to award the medals:

use 5.010;
use strict;
use warnings;
use List::Util qw(sum);

my %score;
open(my $fh, "<", "C:/Scripts/skaters.txt") or die;
while (<$fh>) {
    chomp;
    my($skater, @scores) = split /,/;
    @scores = (sort {$a <=> $b} @scores)[1..5];
    $score{$skater} = sum(@scores) / 5;
}

my @ranking = sort {$score{$a} <=> $score{$b}} keys %score;
foreach my $metal (qw(Gold Silver Bronze)) {
    my $skater = pop @ranking;
    say "$metal medal: $skater, $score{$skater}";
}

Much of the code should look familiar from event 1, so I'll only point out some constructs not encountered before:

use 5.010;

This line tells Perl that the script requires Perl 5.10 features and will not run on older versions. It is much better to stop right away with a sensible error message than to display a syntax error later in the script. This line is even useful when we are running under 5.10, as it enables the additional keywords in 5.10 that would otherwise be hidden for backward compatibility reasons (in this instance we are using the new say() function later in the script).

my($skater, @scores) = split /,/;

This statement splits the current record (in $_) by each comma. The first field is the athlete's name and will be stored in $skater; the remaining fields are the individual scores by each judge. They are all being absorbed into @scores.

@scores = (sort {$a <=> $b} @scores)[1..5];

Now we sort the scores numerically, using the “{$a <=> $b}” sort expression. The default sort algorithm uses string comparison (the cmp operator) and is therefore not suitable here because the score 9 would end up being higher than the score 89. Of course I don't expect anybody to score below 10 anyways, but it is always better for an algorithm to be correct for edge cases, too. And who knows, maybe one skater will get a score of 100 from one of the judges?

After sorting the scores, we put them into a list by surrounding the sort expression with parentheses. This allows us to take a list slice with the index range [1..5]. This effectively drops the elements 0 and 6 from the list, which will be the lowest and highest scores respectively.

$score{$skater} = sum(@scores) / 5;

Now we just have to average the remaining scores. We could write this out manually (there are just 5 elements in the array), but the List::Util module already contains a nice sum() function that will do it for us. And it is fast too, since it is written in C. You should definitely check out List::Util if you are not already familiar with it. While you are at it, don’t forget about the companion module Scalar::Util either.

my @ranking = sort {$score{$a} <=> $score{$b}} keys %score;

Once we have calculated all the athletes' scores we can create a ranking by sorting the skaters' names by their scores. This is straight-forward, as the names are the keys of the %score hash, and all we have to do is write a little custom sort function. The first entry in @ranking now contains the name of the skater with the lowest score, and the last entry, well, you get the idea…

foreach my $metal (qw(Gold Silver Bronze)) {
    my $skater = pop @ranking;
    say "$metal medal: $skater, $score{$skater}";
}

Awarding the medals is straight-forward too: we loop through the medals one-by-one, each time removing the top-ranking skater from our results list and printing the medal type, skater name and final score.

This loop is using the new say() function added in Perl 5.10. It is very similar to the print() function, except that it is shorter to type, and it automatically adds a newline at the end.

Time to kill

It looks like a couple of skaters still have to perform, so we have some time to improve our little script a bit. The following things come to my mind:

The script should be more robust. There is a lot of stress on the judges writing down the scores, and the volunteers entering them into the computer. We should check the data for internal consistency:

Every skater should have exactly 7 scores

All scores should be integers between 0 and 100 inclusive.

There should be only one record for each skater.

There are other sources of errors too, like transposed digits (a judge writing 68 when she meant to write 86), but those will hopefully be caught by the dropping of the best and worst scores.

The script should be able to handle different numbers of judges, as long as each skater has the same number of scores. For example in a junior high school competition we might only have 3 or 4 judges.

We should be able to deal with multiple medal winners having the same final score. For example, we might have 2 Gold medal winners and no Silver medal because the 2 top skaters were tied for the best score.

We should be able to award different sets of medals. For a kids competition we may want every contestant to go home with some kind of medal.

use 5.010;
use strict;
use warnings;
use List::Util qw(sum);

my($judges, %score);
open(my $fh, "<", "C:/Scripts/skaters.txt") or die;
while (<$fh>) {
    chomp;
    my($skater, @scores) = split /,/;

    $judges //= @scores;
    die "Wrong number of scores in '$_'\n" unless $judges == @scores;

    die "Invalid score in '$_'\n"
        unless grep(/^(0|[1-9]\d?|100)$/, @scores) == $judges;

    die "Duplicate record for $skater\n" if defined $score{$skater};

    if (@scores > 2) {
        # drop highest & lowest score
        @scores = sort {$a <=> $b} @scores;
        shift @scores; pop @scores;
    }
    $score{$skater} = sum(@scores) / @scores;
}

my @ranking = sort {$score{$b} <=> $score{$a}} keys %score;
my @medals = qw(Gold Silver Bronze);

while (@ranking && @medals) {
    my $medal = "$medals[0] medal";
    my $score = $score{$ranking[0]};
    # all skaters with the same score get the same type of medal
    while (@ranking && $score{$ranking[0]} == $score) {
        my $skater = shift @ranking;
        say "$medal: $skater, $score";
        shift @medals;
    }
}

Here are the changes implementing our additional features in detail:

    $judges //= @scores;
    die "Wrong number of scores in '$_'\n" unless $judges ==     @scores;

The first time through the loop we store the number of scores in the $judges variable. The “defined or” operator makes sure that once set, the $judges variable will not be assigned to again. For each skater (including the first one itself) we then test that the number of scores is the same as the one for the first one.

    die "Invalid score in '$_'\n"
        unless grep(/^(0|[1-9]\d?|100)$/, @scores) == $judges;

This statement tests each individual score against the regexp /^(0|[1-9]\d?|100)$/. It means the score has to be either 0, a digit from 1 to 9 followed by any digit, or the number 100. This is the same as the set of integer numbers between 0 and 100 inclusive while disallowing leading zeros.

The number of scores matching this regexp must be the same as the number of judges; otherwise we know now that at least one of the scores didn't match.

    die "Duplicate record for $skater\n" if defined     $score{$skater};

If we already have stored a final score for a specific skater name, then we know this is a duplicate entry and want to recheck the data file before handing out the medals.

    if (@scores > 2) {
        # drop highest & lowest score
        @scores = sort {$a <=> $b} @scores;
        shift @scores; pop @scores;
    }
    $score{$skater} = sum(@scores) / @scores;

We obviously don't want to drop any scores if we have only one or two judges. But once we have 3 or more, we again want to drop the first and last entry of the numerically sorted scores.

my @ranking = sort {$score{$b} <=> $score{$a}} keys %score;
my @medals = qw(Gold Silver Bronze);

We create a ranking by sorting the skaters' names by score, this time in descending order to make it easier to refer to the highest ranking skater with $ranking[0]. The list of medal types is also stored explicitly in an array to allow us to easily plop in a different list. For example, for the kids competition we may want to give a Bronze medal to everyone who didn't win either Gold or Silver, so all kids go home happy:

my @medals = (qw(Gold Silver), qw(Bronze) x @ranking);

Yes, I know that this is 2 medals more than we have skaters, but nobody said we have to hand out all of them.

We are going to award the same type of medal to all athletes with the same score. While doing that we keep on removing medals from @medal for each additional winner, so if we hand out 2 Gold medals, we’ll skip the Silver and continue with Bronze for the third athlete. This continues until we run out of either skaters or medals, whichever comes first:

while (@ranking && @medals) {
    my $medal = "$medals[0] medal";
    my $score = $score{$ranking[0]};
    # all skaters with the same score get the same type of medal
    while (@ranking && $score{$ranking[0]} == $score) {
        my $skater = shift @ranking;
        say "$medal: $skater, $score";
        shift @medals;
    }
}

The skating competition has ended now, so let's run our script and announce the winners:

Gold medal: Guido Chuffart, 88.2
Silver medal: Jack Creasey, 85.8
Bronze medal: Cecilia Cornejo, 85.4

As promised for the Scripting Games competition, there were no ties for the first 3 places. So just for testing I invented a couple of extra medals to see if the “skaters.txt” dataset contains any entries with identical scores a little further down the list:

my @medals = qw(Diamond Platinum Gold Silver Bronze Nickel Copper Tin);

Running the script with the extended medal set produces:

Diamond medal: Guido Chuffart, 88.2
Platinum medal: Jack Creasey, 85.8
Gold medal: Cecilia Cornejo, 85.4
Silver medal: Chase Carpenter, 85
Silver medal: Martin Chisholm, 85
Nickel medal: Christian Cletus, 84.8
Copper medal: Bjarke Rust Christensen, 83.2
Tin medal: Andy Carothers, 83

We indeed end up with 2 Silver medals and no Bronze, so everything seems to be working as expected.

With the awards ceremony completed, the first day of the 2008 Winter Scripting Games is winding down. I'm heading home now to watch the recordings of today’s events in the PowerShell division of the Games on my media center. If you have any ideas how to improve my script further, or just want to give me some general feedback on the events at the Games today, please let me know at jand@activestate.com. I hope I’ll see you all in the third event.


Top of pageTop of page