— Philip Pullman, The Subtle Knife (Scholastic, 1997; USA, Knopf, 1997)
In contrast, the abstract visualization of book texts is not a large or a mature field of study, but there are notable and inspirational examples. The following sections list some of these
Data visualizations fall into two overlapping camps: exploration and communication. Larkin’s 1914-1918 Dispensational Charts are about communicating scripture and prophesy from The Bible. They diagram the structure of each topic (e.g. “The Heavens” or “The Second Coming”) and use flow, representational images, and references back to Bible passages to illuminate each topic.
The seminal work of abstract exploratory visualization of book texts is Brad Paley’s “Text Arc“. TextArc is a screen based application Paley designed and implemented that takes a text and displays it twice. Firstly, line by line in tiny font around the edge of a giant ellipse. And then secondly word-by-word with each word anchored by invisible springs to the sentences in which it occurs. Common words are removed (so called ‘stop words‘) and the remaining words are rendered so that more common words use a larger font and are drawn on top of any less common words sharing the same screen area. Paley’s TextArc can be used to explore any text but he often demonstrates it using Alice in Wonderland and then at the centre, in big letters, is the word Alice as that occurs throughout the book. TextArc has many other features, including an elegant dynamic path sweeping through the work as the text is read through.
Text Arc was conceived as a tool to help academics and other readers analyse texts. Another outlet proved to be selling high quality printouts as a beautiful memento of one’s favourite texts. The application of book visualization to academic literary studies has been continued in work like Plaisant et al’s “Exploring Erotics in Emily Dickinson’s Correspondence“.
Partly because of the widespread availability of electronic versions of the text, partly because of its cultural significance, and partly because of the huge numbers of people who care about it The Bible has proved an intriguing source of visualizations.
While on NYU’s Interactive Telecommunications Programme Anh Dang built “Gospel Spectrum“, an interactive visualization exploring the gospel accounts of Christ’s life. Each episode in Christ’s life is represented as a coloured bar with the colours representing the different gospels and their length representing the number of verses spent on that episode. The resulting visualization allows one to see how Christ’s life unfolds through the gospels: which gospels concentrate on which parts of his life, and when the gospels come together to record an episode.
Started at Central Saint Martin’s School of Art, Becker’s “In Translation” shows visually the structural similarities and differences between different language translations of the Tower of Babel story, for example showing the position allocated to each letter-combination. “In Translation” both enforces the message of The Tower of Babel Story by highlighting the differences between human languages, but also cuts across it by showing structural similarities.
Chris Harrison’s visualizations of The Bible follow two paths. Firstly Harrison took a set of textual cross references found in The Bible compiled by Lutheran Pastor Christoph Romhild and displayed the links visually, resulting in a beautiful picture that gives detail about which chapters contain most cross references that also impresses the viewer with the sheer number of cross references. The second set looks at proper nouns through The Bible and overlays them as a tag cloud. But rather than abstracting the positions of the nouns from their occurrence in the text they are placed at their ‘centre of mass’.
The last Bible visualization we’ll touch on is Steinweber and Koller’s “Similar Diversity“. Like Harrison’s work Steinweber and Koller use arc-diagrams and other visual features, but rather than using them to explore the structure within The Bible Similar Diversity shows the similarities and differences between holy books of different religions.
Before moving on to describe our own visual explorations of the text of Pullman’s His Dark Materials trilogy there are four other interesting book visualization projects that are worth drawing attention to because of other potential features they make use of.
In her CSM MACD project “Romancing Dimensions” Ebany Spencer attempts to use purely visual notations systems to retell Edwin Abbott Abbott’s “Flatlands” story. Though entirely paper based Spencer’s work uses three dimensions by using paper cut-outs to move some of her time-line representations of the work out from the background plane.
Tim Walter’s textour (in German) has uses time and animation to show the structural elements of the book accruing as data is added or filtered.
Stephanie Posavec’s beautiful visualizations of Jack Kerouac’s “On the Road” (and some other contrasting novels) are not the result of a computer analysis of the work but the result of careful, loving, and painstaking analysis by-hand of the text itself. Posavec produces several visualizations, from the spider-like Posavec diagrams which map the sentence lengths authors’ use (a line continues for the length of the first sentence, then turns ninety degrees and continues for the length of the second sentence, etc) through to the elegant ‘literary organism’ flower like structures.
Many Eyes is a social visualization site. It is social in many ways: users upload data sets that are immediately shared with all the other Many Eyes members; anyone can use any of the provided visualization tools to visualize the data sets; these visualizations can be shared and discussed on the Many Eyes sites, or embedded into blog posts to foster conversation and analysis beyond the site. Many Eyes was conceived, designed, and built by IBM Research’s Visual Communications Lab. It was originally thought that most of the datasets and visualizations would be based on numeric data, and so the visualizations were tailored towards quantitative data. In fact the inventors were taken aback by the amount of textual data sets uploaded, including notably The Bible and political speeches, and they have written about the text based visualizations designed and added in response [WV08].
Our work focuses on the abstract visualization of children’s book series, and in particular the trilogy “His Dark Materials” by Philip Pullman. Pullman’s trilogy is made up of the three novels “The Northern Lights” (called “The Golden Compass” in the USA and in the movie adaptation), “The Subtle Knife”, and “The Amber Spyglass”. We choose this genre partly through personal passion and partly because of the range of potential enthusiastic readers. The best children’s book series (especially before they are completed) are read and discussed by both child and adult readers and many of these readers develop their own theories which they share with their friends and with other readers online. Similarly academic interest is piqued leading to conferences and journals dedicated to the study of children’s literature.
We started with Linda sketching out how some visualizations might look (without using actual) data.
In the first of these Linda looked at the distribution of words (e.g. characters names) throughout the text, using connecting arcs (among other ideas) to give a sense of the rhythm of related characters through the text.
The second set of sketches looks at the character word plots, what form they might take and what visual dimensions this would give us to plot differing data or to reinforce existing data.
Thirdly Linda tackled the notion of themes, and the sketches she produced show how we might plot themes progression through the books. These are the sketches that we have had least success moving into functioning visualizations since they rely on a more sophisticated notion of theme than looking at individual word positions may provide.
The last series of visualization sketches Linda produced looked at text. Instead of drawing structures based on the relationships between words we looked at drawing the structures with the words themselves. This proved quite playful. I had wanted the visualisations to be legible themselves as text, but some of the sketches jump to the opposite pole, for example rendering only the words of interest and leaving the surrounding text as measured space.
The two ideas that we built up into working visualizations are the flower-like structures showing the words occurring near the characters names (or other given words) and renderings of the whole text with the character names of interest highlighted with colours and arcs.
The first of the visualization ideas that we implemented were the character flowers. Figure 1 shows the character flower for the word Lyra. Central to the flower is the word “lyra” itself, surrounded by a ‘lifebelt’ which shows, starting from the 12 O’clock position, the occurrences of the word “lyra” through the series, with each occurrence resulting in a thin red line.
We can see from the number of crowded red lines that “lyra” is a frequently occurring word, as we would expect, but that the second and third books contain episodes where she is not mentioned. Moving out from that each ‘bud’ represents a word. Here we are looking at all the words which immediately follow the word “lyra” in a sentence. Those words are arranged in order of the frequency with which they appear after “lyra”, and the size of the bud reflects the frequency of the word overall (i.e. the number of chapters it occurs in, regardless of whether it occurs after the word “lyra”). The final measure is the distance from the centre that the bud is drawn. This reflects the probability that when the word occurs it occurs after “lyra”. So we see two buds placed near the centre at the start are two words that occur frequently after the word “lyra” and are unlikely to occur elsewhere. Indeed the two words are Lyra’s surnames: Silvertounge and Belacqua. Other words drawn towards the centre are evocative of Lyra’s personality: “joyfully”, “quelled”, “exulted”, “definitely”, “judged”, “raided”, … but two stand out as anomalous: “blushed” and “obediently”. Clicking on the bud brings up the sentences in which the word follows the word “lyra”. From these sentences we find that the terms are used when Lyra is in disguise. In some respects this shows that the visualization works – the anomalies are indeed anomalies, but they are ones consciously placed by Pullman, rather than subconscious ones.
Characters names can also be used in their possessive sense, e.g. “lyra’s” and the character flower in Figure 2 shows the diagram for the words after “lyra’s”. These are mostly body parts (les, arms, hair, etc) and this style is born out in Pullman’s writing about the other characters.
These visualizations show the entire text of the three volumes that make up the trilogy. We were interested to see the rhythm of the characters occurrences in the whole text, especially two related characters. Figure 3 shows a fragment of the entire trilogy, with linked coloured disks over occurrences of Lyra and Will’s names. We can quickly see simple facts like Will’s absence from the first book, and more curious aspects like the periods of the second book where neither of them are mentioned (presumably the sections focussed on Mary Malone, Lord Asreil, or Mrs Coulter). Printed out this diagram is many feet long, and the text itself is (just) readable. This combination of text level detail and global pattern is particularly interesting. I was hoping that this visualization would highlight a poetic choice across the trilogy. Tolstoy starts and ends “Anna Karenina” at a railway station, and Pullman purposefully opens the first book with the word “Lyra” and ends the last book with the word “Lyra”. This should stand out as the visualization should start and end with a coloured disk. But it does not. In fact Pullman precedes the start of his book with a quote from Milton’s “Paradise Lost”, which stops the poetic symmetry coming out in the visualization.
The initial sketches were built in Adobe Illustrator. Having chosen our two initial candidates for implementation these were prototyped in Processing, a language aimed at designers new to programming. Later these prototypes were re-worked into C# and WPF. The texts themselves were drawn from the publishers Quark documents, saved to plain text, broken down into chapters, sentences, and words in C# and stored in a SQL Server 2008 database.