2008 Winter Scripting Games

Jan Dubois' Solution to Advanced Perl Event 6: You Call That a Strong Password?

Event 1 Solution

About the Author

Jan Dubois is the tech lead for all Perl technologies at ActiveState. He is a member of the core Perl5-Porters group that maintains the Perl language implementation. Jan wrote the Win32::OLE Automation support for Perl while he was still working in the financial industry. He has a Masters degree in Physics from the University of Hamburg, Germany. Jan now lives in Vancouver in Beautiful British Columbia, home of the 2010 Winter Olympics.

*

Prime Time

It’s prime time at the sixth event of the 2008 Winter Scripting Games and we are calculating prime numbers… what a coincidence!

The challenge

We have to write a script that calculates and displays all prime numbers between 1 and 200.

How we are going to solve it

The instructions already gave it away: the standard mechanism to calculate tables of (small) prime numbers is the Sieve of Eratosthenes:

use 5.010;
use strict;
use warnings;

use integer;
use constant MAX => 200;

my @prime = (undef, undef, 2..MAX);
undef $prime[2*$_] for 2..int(MAX/2);

for (my $i = 3; $i*$i <= MAX; $i += 2) {
    next unless defined $prime[$i];
    for (my $j = $i*$i; $j <= MAX; $j += 2*$i) {
        undef $prime[$j];
    }
}
say for grep defined, @prime;

I found this slightly too boring, so I added some gratuitous optimizations over the original algorithm. They will make no difference for such a small set of numbers at all, but hey, why not:

use integer;

All numeric operations in this script only use integers, so tell Perl not to bother with upgrading variables to floating point.

my @prime = (undef, undef, 2..MAX);
undef $prime[2*$_] for 2..int(MAX/2);

We start out by setting up @prime as the list of prime number candidates. Each element either contains the number itself, if it is still a candidate, or undef if it has been crossed off the list already. We start out with 0 and 1 already removed, as we know that those are not primes. Then we eliminate all multiples of the number 2. That means that all remaining candidates will be odd (not even).

for (my $i = 3; $i*$i <= MAX; $i += 2) {
    next unless defined $prime[$i];
    for (my $j = $i*$i; $j <= MAX; $j += 2*$i) {
        undef $prime[$j];
    }
}

We walk through @primes, starting at 3. We can increment this index by 2 each time through the loop because we know all even numbers have already been eliminated. And we know that we can stop as soon as $i*$i exceeds the maximum number we are interested in (200) because all multiples of smaller primes have already been eliminated.

If $i is not a prime itself then we don’t need to bother any further, as all multiples of it will have been crossed of the list already because they are also multiples of its prime factors.

If it is prime, then we cross of $i*$i, $i*($i+2), $i*($i+4) etc. Again, we know that all multiples less than $i*$i have already been eliminated. And we skip $i*($i+1), $i*($i+3) etc. because we know these products will be even, and all even numbers are already gone too.

say for grep defined, @prime;

Once the sieve has finished we just print all the numbers that are still left in @prime:

2
3
5
7
11
13
17
…
193
197
199

Disqualified

One thing about prime numbers is that they don’t really change. Unless number theory mathematicians decide to change the definition of prime numbers (extremely unlikely, to say the least), the elements of the set of prime numbers remain the same. They have been calculated countless times already, so it seems quite pointless to continue calculating them over and over. It looks like other people agree with this, and have published lists of prime numbers up to some reasonable limit. So why don’t we just download those pre-calculated numbers and print them:

use 5.010;
use strict;
use warnings;

use LWP::Simple qw(get);

my $html = get("http://www.prime-numbers.org/prime-number-0-5000.htm");
my($primes) = $html =~ /<pre>\n([\d\s]+)/;
say for grep $_ <= 200, split /\s+/, $primes;

The LWP set of modules is extremely useful for all kinds of web programming. The get() function in LWP::Simple is a handy shortcut that hides all the nitty-gritty details of sending HTTP requests, handling responses etc. It just returns the raw content of the response (or undef when there is an error).

The www.prime-numbers.org list does not seem to be available in raw text format but only marked-up in HTML. Fortunately the numbers are inside the only <pre> block on the page, and within this block everything is separated by whitespace.

Therefore. there is no need to use some heavier tool like HTML::Parser to extract the data; we just use a simple regexp to extract the <pre> block and split the result on whitespace. Since the page contains all prime numbers less than 5000 we also need to filter out the ones below 200 and then just print them.

Unfortunately this script would have been disqualified (I assume), because it did not calculate the prime numbers, it just downloaded them. I still wanted to include it in my commentary though to show how easy it is to fetch and scrape web pages with simple Perl scripts.

How about this one?

Ok, so we may not be allowed to just fetch the numbers from somewhere else, but why do we have to implement the sieve algorithm ourselves? Surely there is a module for it already. There doesn’t seem to be anything included in ActivePerl, so let’s search on the “Comprehensive Perl Archive Network” (CPAN). The easiest way to install CPAN modules into ActivePerl is by using the “Perl Package Manager” (PPM). Let’s look for packages with the substring “prime” in the name or abstract:

Event 6 Solution


The Math-Prime-TiedArray package looks like an ideal candidate. All it takes to install it is clicking the “Add” button followed by the “Run” button. Or we can do all this from the command line:

C:>ppm search prime
Downloading ActiveState Package Repository packlist...not modified
1: Math-Coprimes
   Calculate if two or more numbers are coprimes (relatively primes)
   Version: 0.1

2: Math-Prime-Simple
   Calculate prime numbers
   Version: 0.12

3: Math-Prime-TiedArray
   Simulate an infinite array of prime numbers
   Version: 0.04

C:>ppm install 3
Downloading Math-Prime-TiedArray-0.04...done
Unpacking Math-Prime-TiedArray-0.04...done
Generating HTML for Math-Prime-TiedArray-0.04...done
Updating files in site area...done
   2 files installed

Unfortunately the contest rules specify that the script should not require any additional modules except for what is bundled with ActivePerl. The only exception mentioned was for the Date::Calc module. But maybe we can sneak the installation of the additional module by the judges by putting it into the script itself:

use 5.010;
use strict;
use warnings;

BEGIN {
    return if eval { require Math::Prime::TiedArray };
    `ppm install Math-Prime-TiedArray >nul`;
}

use Math::Prime::TiedArray;

The BEGIN block is compiled and executed as soon as it has been parsed completely (up to the closing “}”). Inside we first try to load Math::Prime::TiedArray directly, in case it is already installed. We do this inside an eval() block to catch the error if it isn’t installed. In that case we execute the “ppm install …” command, redirecting all output to the “nul” device to avoid giving any hints to the person running the script that we may be doing something else beyond calculating prime numbers.

Once the BEGIN block has been executed we can use() the Math::Prime::TiedArray module normally. Perl will remember if the previous require() was successful and only load the module if it hasn’t been loaded yet (in case you are wondering, this information is recorded in the %INC hash).

tie my @primes, "Math::Prime::TiedArray";

As the name implies Math::Prime::TiedArray uses a tied array to implement an array containing all prime numbers. The tie() mechanism is very powerful: it allows you to implement all the semantics of a scalar, array, hash or file handle in a module. That way accessing a keyed record in a database can be made to look just like indexing into a hash.

The tie() function above associates our @primes array with the array implementation in Math::Prime::TiedArray. This does not mean that this statement won’t return before the heat-death of the universe because it now has to compute an infinite number of primes. Instead, the computation is delayed until we actually access the array elements. If an element has already been computed then it will be returned immediately. Otherwise the module will calculate the missing primes up to this element and also caches them in case they are needed again.

say while ($_ = shift(@primes)) && ($_ <= 200);

Now that we have an array containing an infinite supply of prime numbers (at least until we exhaust the precision of Perl numbers), we simply remove them from the front of the array one at a time until we have printed all prime numbers less than 200. The complete script looks like this:

use 5.010;
use strict;
use warnings;

BEGIN {
    return if eval { require Math::Prime::TiedArray };
    `ppm install Math-Prime-TiedArray >nul`;
}

use Math::Prime::TiedArray;
tie my @primes, "Math::Prime::TiedArray";
say while ($_ = shift(@primes)) && ($_ <= 200);

So, what do you think, does the installation of an additional module inside the script itself violate the rules of the Scripting Games? Let me know at jand@activestate.com.

PS: At the end of my commentary on the instant-runoff election in event 3 I asked “can you come up with a sample dataset in which a different order of elimination of tied losers can result in different outcomes for the winner”. Yitzchak Scott-Thoennes did:

Given these numbers of ballots with these votes:

34 x A,B,C
33 x B,A,C
33 x C,B,A

First round has B and C tied for last place:

A 34, B 33, C 33.

Eliminating C gives second round:

A 34, B 66

Eliminating B gives second round:

A 67, C 33

Thanks Yitzchak! Looks like I should have tried a little harder. Note that there are actual tie-breaking rules in place for some real elections using instant-runoff voting; it is good to know that they sometimes can make a difference in the outcome.


Top of pageTop of page