2008 Winter Scripting Games

Jan Dubois' Solution to Advanced Perl Event 8: Making Beautiful Music

Event 1 Solution

About the Author

Jan Dubois is the tech lead for all Perl technologies at ActiveState. He is a member of the core Perl5-Porters group that maintains the Perl language implementation. Jan wrote the Win32::OLE Automation support for Perl while he was still working in the financial industry. He has a Masters degree in Physics from the University of Hamburg, Germany. Jan now lives in Vancouver in Beautiful British Columbia, home of the 2010 Winter Olympics.

*

Making Beautiful Music

There is music in the air. And in Event 8 of the 2008 Winter Scripting Games, the organizers want a copy of your playlist on CD!

The challenge

We have to pick songs from a list and try to fill a CD with them. We cannot pick more than 2 songs from the same artist, and we should squeeze at least 75 minutes of music on the CD. If we run over 80 minutes, then our collection won’t fit and we lose the competition.

How we are going to solve it

This challenge is a variant of the bin packing problem. These problems are known to be NP-hard, which is geek-speak for saying that it isn’t generally possible to find the best solution to them. It is, however, possible to come up with algorithms that approximate a good solution in a reasonable amount of time (the value of “reasonable” depends on how good the approximate solution needs to be).

Any play list with a running time between 75 and 80 minutes is a valid solution for this event. This looks like a pretty wide range. I browsed through the provided song list and noticed that there are plenty of songs with less than 5 minutes playing time. Therefore I decided to try a simple algorithm at first.

Start by choosing the longest-running song still available. Once we get close to the lower mark of 75 minutes we should still have plenty of songs left that will push us over this mark without overshooting the 80 minutes limit:

use strict;
use warnings;

use constant MAX_TIME => 80*60;

my(%time);
open(my $fh, "<", "C:/Scripts/songlist.csv") or die;
while (<$fh>) {
    my($artist,$title,$min,$sec) = /^(.*?),(.*?),(\d+):(\d+)/;
    $time{"$artist\t$title"} = $min*60 + $sec;
}

my %cd;
my $time = 0;
while () {
    # Sort songs by playing time, discarding all that won't fit anymore
    my @sorted = sort {$time{$b} <=> $time{$a}} keys %time;
    shift @sorted while @sorted && $time + $time{$sorted[0]} > MAX_TIME;
    last unless @sorted;

    # Add song to CD and remove from pool
    my $song = shift @sorted;
    $time += $time{$song};
    $cd{$song} = delete $time{$song};

    # Remove songs from pool if we already have 2 songs by this artist
    my($artist) = $song =~ /^(.*?)\t/;
    if (grep(/^$artist\t/, keys %cd) == 2) {
        delete $time{$_} for grep /^$artist\t/, keys %time;
    }
}

########################################################################

sub mmss { sprintf "%d:%02d", int($_[0]/60), $_[0]%60 }
print "$_\t", mmss($cd{$_}), "\n" for sort keys %cd;
printf "\nTotal music time: %s\n", mmss($time);

We start off by reading the song list from a file:

my(%time);
open(my $fh, "<", "C:/Scripts/songlist.csv") or die $!;

David Golden gave me some feedback on the earlier commentaries: “I personally think it's good style to die with at least "$!" so there's some feedback as to what happened.”

The $! variable represents the C runtime “errno” variable, which contains the error code after a failed library call (its value is not defined after a successful call). In numeric context $! is just the error number, but when used as a string it automagically transforms into a full error message.

So if someone had deleted the songlist.csv file we would see:

No such file or directory at ae8a.pl line 7.

This is indeed much nicer than the generic message:

Died at ae8a.pl line 7.

We next read the file line by line, storing each song into the %time hash:

while (<$fh>) {
    my($artist,$title,$min,$sec) = /^(.*?),(.*?),(\d+):(\d+)/;
    $time{"$artist\t$title"} = $min*60 + $sec;
}

The keys in the hash are the artist name and song title, separated by a tab character. This is the format used to output songs in the contest description. The values of the hash entries are the running times of the songs in seconds.

my %cd;
my $time = 0;
while () {
    # Sort songs by playing time, discarding all that won't fit anymore
    my @sorted = sort {$time{$b} <=> $time{$a}} keys %time;
    shift @sorted while @sorted && $time + $time{$sorted[0]} > MAX_TIME;
    last unless @sorted;

We are going to move the selected songs over into a second hash %cd and also keep a running total of the accumulated playing time in $time.

Each time through the loop we sort all remaining songs in the pool by playing time. Then we drop all songs that won’t fit on our CD anymore. We are done when there are no more songs left to choose from. Let’s just hope that by that time we have at least 75 minutes of music already.

    # Add song to CD and remove from pool
    my $song = shift @sorted;
    $time += $time{$song};
    $cd{$song} = delete $time{$song};

If we find a song that fits then we add it to our CD, delete it from the pool of available songs, and add its running time to our total in $time.

    # Remove songs from pool if we already have 2 songs by this artist
    my($artist) = $song =~ /^(.*?)\t/;
    if (grep(/^$artist\t/, keys %cd) == 2) {
        delete $time{$_} for grep /^$artist\t/, keys %time;
    }

We extract the artist name from the song description (artist name and song title separated by a tab character). If we find that our CD now contains 2 songs by this artist then we remove all remaining songs by the same artist from the pool.

After this we loop, sorting the remaining songs by playing time again etc.

sub mmss { sprintf "%d:%02d", int($_[0]/60), $_[0]%60 }
print "$_\t", mmss($cd{$_}), "\n" for sort keys %cd;
printf "\nTotal music time: %s\n", mmss($time);

After we drop out of the loop, because the pool ran dry, it’s time to print our playlist; we also print the total music time to verify that we have indeed at least 75 and no more than 80 minutes of music. Let’s see if this approach worked:

D:\tmp\ScriptingGames>perl ae8a.pl
Badfinger       Carry On Til Tomorrow   4:49
Cracker Eurotrash Girl  8:03
Dire Straits    So Far Away     5:12
Dire Straits    Ticket to Heaven        4:25
Don McLean      American Pie    8:35
Donovan Catch the Wind  5:02
Donovan Season of the Witch     4:55
George Harrison What is Life?   4:27
Nick Cave and the Bad Seeds     Red Right Hand  4:48
Nirvana Heart-Shaped Box        4:41
REM     Losing My Religion      4:28
REM     Man on the Moon 5:13
Robert Palmer   Addicted to Love        5:18
The Animals     House of the Rising Sun 4:31
The Beatles     Helter Skelter  4:29

Total music time: 78:56

We are just 1 minute and 4 seconds short of the maximum allowed playing time, so the selection part seems to be working alright. The output, formatted as shown in the contest rules, looks pretty ugly though.

I know that the Perl Package Manager (PPM) prints nice-looking tables, so I did some quick spelunking in the sources and found that it uses the ActiveState::Table module. I decided to keep the music selection part of the script, but rewrite the output part:

########################################################################

use ActiveState::Table ();

sub mmss { sprintf "%d:%02d", int($_[0]/60), $_[0]%60 }
sub artist { $_[0] =~ /^(?:The )?(.*)/ && $1 }

my $table = ActiveState::Table->new;
foreach my $song (sort {artist($a) cmp artist($b)} keys %cd) {
    my($artist, $title) = split /\t/, $song;
    $table->add_row({
        Artist  => $artist,
        Title   => $title,
        "mm:ss" => mmss($cd{$song}),
    });
}
print $table->as_box(
    align        => {"mm:ss" => "right"},
    box_chars    => -t STDOUT ? "dos" : "ascii",
    show_trailer => 0,
);
printf "\nTotal music time: %s\n", mmss($time);

I also didn’t like how “The Animals” and “The Beatles” were both sorted at the end under “T”. I think they should be sorted under “A” and “B” respectively, so I added a custom sort function that ignores any /^The / prefix in the artist name field.

print $table->as_box(
    align        => {"mm:ss" => "right"},
    box_chars    => -t STDOUT ? "dos" : "ascii",
    show_trailer => 0,
);

The expression “-t STDOUT” checks if STDOUT is a real console window or just any old file handle. When we are printing to a console, then we want to use the fancy box drawing characters from the OEM character set. But if we print to a file, then we restrict ourselves to ASCII characters for drawing the box borders:

Event 8 Solution


This looks so much nicer already. As a third alternative I re-wrote the output part once more to create an Excel spreadsheet for the playlist:

########################################################################

sub mmss { sprintf "%d:%02d", int($_[0]/60), $_[0]%60 }
sub artist { $_[0] =~ /^(?:The )?(.*)/ && $1 }

use Win32::OLE qw(with);
use Win32::OLE::Const ('Microsoft Excel .* Object Library');

my $excel = Win32::OLE->new("Excel.Application");
my $book  = $excel->Workbooks->Add();
my $sheet = $book->Worksheets(1);

my $rows = 3 + keys %cd;
my @songs = sort {artist($a) cmp artist($b)} keys %cd;
$sheet->Range("A1:C$rows")->{Value} = [
    [qw(Artist Title mm:ss)],
    map([split(/\t/, $_), "'".mmss($cd{$_})], @songs),
    [],
    ["Total music time: ".mmss($time)],
];

$sheet->Rows(1)->Font->{Bold} = 1;
$sheet->Rows($rows)->Font->{Italic} = 1;
$sheet->Columns("C")->{HorizontalAlignment} = xlRight;
$sheet->Columns("A:C")->AutoFit;
with($sheet->Range("A$rows:C$rows"),
     MergeCells => 1,
     HorizontalAlignment => xlCenter,
 );
$excel->{Visible} = 1;

We are going to use some Excel-specific constants later in the code, so let’s start by importing them from the Excel type library into Perl:

use Win32::OLE::Const ('Microsoft Excel .* Object Library');

my $excel = Win32::OLE->new("Excel.Application");
my $book  = $excel->Workbooks->Add();
my $sheet = $book->Worksheets(1);

The type library name is actually interpreted as a regular expression, so the “.*” pattern in the middle of the name makes sure we are not dependent on a specific version of Excel. Win32::OLE::Const will always choose the latest version of the type library in case multiple versions are registered.

Loading Excel, creating a new workbook,and selecting the first worksheet are all simple automation calls using the Excel object model.

my $rows = 3 + keys %cd;
my @songs = sort {artist($a) cmp artist($b)} keys %cd;
$sheet->Range("A1:C$rows")->{Value} = [
    [qw(Artist Title mm:ss)],
    map([split(/\t/, $_), "'".mmss($cd{$_})], @songs),
    [],
    ["Total music time: ".mmss($time)],
];

Our total data block will contain a header line, one line for each song, a blank line, and a summary line with the total music time.

We build up the table using Perl data structures: a reference to an array of lines, where each line is in turn a reference to an array containing the values for each row in that line: [[@row1], [@row2], …].

Excel doesn’t have an internal format to display a time duration in the mm::ss format; it only knows about dates. Therefore we need to format the running times in Perl and send them to Excel as strings. The initial apostrophe is necessary to prevent Excel from interpreting the value as an hh:mm time value for some time in the current day. Unfortunately this means that we won’t be able to do any number crunching with the running times inside Excel.

We can simply assign this big list of lists data structure to the Value property of a Range object in Excel. Win32::OLE will internally translate everything into a properly-sized SAFEARRAY of VARIANT data structures that Excel will understand.

$sheet->Rows(1)->Font->{Bold} = 1;
$sheet->Rows($rows)->Font->{Italic} = 1;
$sheet->Columns("C")->{HorizontalAlignment} = xlRight;
$sheet->Columns("A:C")->AutoFit;
with($sheet->Range("A$rows:C$rows"),
     MergeCells => 1,
     HorizontalAlignment => xlCenter,
 );
$excel->{Visible} = 1;

Now it is time to do some prettification of the results: make the header row bold, the total running time row italic, and right-align the playing time column. All columns should auto-adjust their widths to fit the data, and finally we want to center the total running time over the total width of the table:

Event 8 Solution


I really wanted to play some more with the packing algorithm, seeing if I could get even better results by adding some randomization and heuristics. But this commentary is already too long, so I leave that for another day. Let me know if your solution got closer to 80 minutes at jand@activestate.com.


Top of pageTop of page