2008 Winter Scripting Games

Jan Dubois' Solution to Advanced Perl Event 3: Instant (Runoff) Winner

Event 1 Solution

About the Author

Jan Dubois is the tech lead for all Perl technologies at ActiveState. He is a member of the core Perl5-Porters group that maintains the Perl language implementation. Jan wrote the Win32::OLE Automation support for Perl while he was still working in the financial industry. He has a Masters degree in Physics from the University of Hamburg, Germany. Jan now lives in Vancouver in Beautiful British Columbia, home of the 2010 Winter Olympics.

*

Instant (Runoff) Winner


Welcome to a new day in the 2008 Winter Scripting Games. Today is Election Day, and we need to count the votes in the third event of the advanced division.

The challenge

We need to read votes from a text file and determine the candidate with a majority of the votes using the instant runoff algorithm, which means after each round of vote counting we need to check if we have a winner (at least 50% plus 1 vote). If we don’t, then we eliminate the candidate with the lowest vote count and start over.

How we are going to solve it

use 5.010;
use strict;
use warnings;
use List::Util qw(sum);

open(my $fh, "<", "C:/Scripts/votes.txt") or die;
chomp(my @votes = <$fh>);

# Start with a full list of all candidates
my %count = map {$_ => 0} map split(/,/), @votes;

while () {
    # Count the votes for each remaining candidate
    $count{$_} = 0 for keys %count;
    for (@votes) {
        if (my($first) = grep defined $count{$_}, split /,/) {
            $count{$first}++;
        }
    }
    # Do we have a tie for first place?
    my @counts = values %count;
    unless (grep $_ != $counts[0], @counts) {
        local $" = ", ";
        say "Tie between @{[sort keys %count]}.";
        exit;
    }
    # Do we have a winner?
    foreach my $name (keys %count) {
        my $percent = 100 * $count{$name} / sum(@counts);
        next unless $percent > 50;
        printf "The winner is $name with %.2f%% of the vote.\n", $percent;
        exit;
    }
    # Eliminate the candidate with the least number of votes
    my $loser = (sort {$count{$a} <=> $count{$b}} keys %count)[0];
    delete $count{$loser};
}
die "You should not have been able to come here.  Now you die!\n";

In this event we may need the data from the external file multiple times, so instead of reading it over and over we read it once and store it in an array:

chomp(my @votes = <$fh>);

Reading from a filehandle in list context will return all the lines at once. We want to get rid of all the newline characters at the end of each line; fortunately chomp() takes a list of arguments and will chomp each one in turn.

Initially I thought I should validate all the ballots, somewhat like this:

my $vote0 = join(',', sort split /,/, $votes[0]);
die if grep $_ ne $vote0, map(join(',', sort split /,/), @votes);

My assumption was that each ballot has to list the same names, just in different order. You would have to rank every candidate, and you would not be allowed to specify additional names, leave names out, or vote multiple times for the same candidate. So after sorting all names alphabetically, all ballots should look the same.

Surprisingly it turned out that this was not the case. Further investigation showed a number of votes looked like this:

Ken Myer,Pilar Ackerman,Jonathan Haas,Pilar Ackerman

They contain “Pilar Ackerman” twice and don’t include “Syed Abbas” at all. A quick check with the Games organizers told me that these entries should be treated as valid votes. This means that the total number of votes may not be constant between different rounds of voting, because some ballots may run out of candidates before the winner is selected. We need to keep that in mind.

Let’s start by creating a hash table of all candidates running in this vote. Initially I was going to create it from the first ballot only, but now it looks like we need to retrieve the names from all ballots because the first one may be missing some names:

# Start with a full list of all candidates
my %count = map {$_ => 0} map split(/,/), @votes;

This statement maps each ballot into a list of names, and then maps each name again to the 2-element list “("name" => 0)”. The combined list of all these small mappings is then assigned to %count, initializing the hash. It has the same effect as this assignment:

my %count = (
    "Ken Myer" => 0,
    "Jonathan Haas"  => 0,
    "Pilar Ackerman" => 0,
    "Syed Abbas"     => 0,
    "Ken Myer"       => 0,
    "Jonathan Haas"  => 0,
    "Pilar Ackerman" => 0,
    "Syed Abbas"     => 0,
    # etc. etc.
);

We are initializing each candidate multiple times, but that doesn’t really hurt with the current size of our votes file. %count is going to do double duty today: It servers both as a list of candidates still in the running, and it will contain the number of votes for each candidate in the current round.

Let the counting begin:

    while () {
        # Count the votes for each remaining candidate
        $count{$_} = 0 for keys %count;

The while() statement with an empty condition behaves exactly the same as one with a condition that is always true, e.g. “while (1)”: it creates an endless loop. The only ways to break out of the loop are to “exit” the program, calling “die” to throw an exception, or to execute a “last” statement.

Inside the loop we reset all values in the %count hash back to 0 as they may still contain the vote counts of the previous round.

    for (@votes) {
        if (my($first) = grep defined $count{$_}, split /,/) {
            $count{$first}++;
        }
    }

We need to loop through all the votes and save the first name on each ballot that has not yet been eliminated. Because the values in %count start out at 0 we also need to use the defined() function on $count{$_} to see if the entry exists.

In this statement the parentheses around $first in “my($first) = grep …” are very important. They put the assignment into list context, so that the grep() returns the list of all values that match the condition. In scalar context it would just return the number of matches, and we wouldn’t be able to tell who the first match was.

    # Do we have a tie for first place?
    my @counts = values %count;
    unless (grep $_ != $counts[0], @counts) {
        local $" = ", ";
        say "Tie between @{[sort keys %count]}.";
        exit;
    }

While the Scripting Guys promised that there would be no ties, I decided to check for it anyway. I always feel better if my scripts are more robust, and it takes only a few lines of code. We copy the totals for all the candidates into the @counts array, and if there are no candidates that have a different number of votes than the first one, then we have a tie between all the remaining candidates. Note that the tie can be between more than just 2 persons: we have 1200 votes, so if every candidate ended up with 300 votes, we would have a 4-way tie.

The interesting part in the snippet above is the “@{[sort keys %count]}” expression inside the string. In general @$foo (the same as @{$foo}) interpolates an array reference. The code above creates an anonymous array reference on the fly using []. Inside these brackets we can now write an arbitrary Perl expression that will be evaluated in list context, and each element of this expression will be interpolated into the string in turn.

Interpolated array elements are separated by the $" special variable, which is normally set to a single space. We want the candidate names separated by a comma, so we set it to ", ". It is always good practice to use the local() function when you modify a global variable to make sure it returns to its previous value once execution leaves the current scope. It doesn’t really matter here, as we immediately exit the program after printing the message, but it doesn’t hurt to stick to best practices.

If we had candidates Pilar Ackerman, Jonathan Haas and Ken Myer tied with the same number of votes, the message would say:

Tie between Jonathan Haas, Ken Myer, Pilar Ackerman

If we don’t have a tie, do we maybe have a winner already?

    # Do we have a winner?
    foreach my $name (keys %count) {
        my $percent = 100 * $count{$name} / sum(@counts);
        next unless $percent > 50;
        printf "The winner is $name with %.2f%% of the vote.\n",     $percent;
        exit;
    }

We check each candidate still remaining in the running to see if they have more than 50% of the vote. Note that we have to divide by sum(@counts) and not by sum(@votes) because some ballots may no longer participate in the current round. @counts contains the actual number of votes that have been cast for each candidate in this round.

If we don’t have a winner, then it is time for elimination:

    # Eliminate the candidate with the least number of votes
    my $loser = (sort {$count{$a} <=> $count{$b}} keys %count)[0];
    delete $count{$loser};
}

This is straight forward again: sort all candidates by vote count and pick the one with the fewest votes. Then remove that name from the %count hash. That way this candidate will no longer be able to receive any votes in the subsequent rounds.There is no exception handler set up, and there is no “last” statement inside the endless while() loop. Therefore it should be impossible to reach any statements after it.

die "You should not have been able to come here.  Now you die!\n";

To verify that my script works correctly I run it inside the Komodo debugger and set a few breakpoints at strategic locations. After adding the %count, @counts, sum(@counts) and @votes watch expressions I can see that the correct candidate is eliminated each round, and that everything else looks like I expected it to:

Event 1 Solution


Everything looks good to me. The final result is:

The winner is Pilar Ackerman with 50.17% of the vote.

One thing I’ve been wondering about is that it is also possible to have ties for the lowest vote count. How should that be handled? Should all candidates tied for the last spot be eliminated together? Should one be chosen at random (which is what the script essentially does)? Or are there some other tie-breaker rules?

After thinking about this for a little while I have convinced myself that it doesn’t matter. You will always end up with the same winner, or the same set of candidates tied for first place, regardless in which order you eliminate the others. Or can you come up with a sample dataset in which a different order of elimination of tied losers can result in different outcomes for the winner? Let me know at jand@activestate.com.


Top of pageTop of page