A stream of dancing lights, for all the world like the shimmering curtains of the aurora, blazed across the screen. They took up patterns that were held for a moment only to break apart and form again, in different shapes, or different colours; they looped and swayed, they sprayed apart, they burst into showers of radiance that suddenly swerved this way or that like a flock of birds changing direction in the sky. And as Lyra watched, she felt the same sense, as of trembling on the brink of understanding, that she remembered from the time when she was beginning to read the alethiometer.
— Philip Pullman, The Subtle Knife (Scholastic, 1997; USA, Knopf, 1997)
Digital technologies have repeatedly redefined the paper world of books. Digital printing has overhauled the publishing processes, and the internet has revolutionised the way audiences and authors connect to share their enthusiasm and criticism. Now the digitization of books themselves, either for searching, browsing, and reading on a computer screen through services like Google Books, or for reading on dedicated devices like Amazon’s Kindle or the Sony Reader are threatening the established order.
But for this project we side-step these issues and concentrate instead on how the analytical power and display capabilities of computers may be used to enhance our understanding of book texts. We use the term “book texts” rather than the word “books” as we are not trying to build computer systems that might understand books, but rather we use the computer’s ability to treat books as an abstract sequence of words as the starting point for new analytical tools.
Who would use such tools? Anyone with an interest in books, be they authors, readers, publishers, agents, critics, academics, etc may find such tools useful, but we have designed our visualizations with fans and academic readers in mind. These readers form theories about the books that stand alongside the author’s own understanding and we hope that the abstract visualizations provided may help such an endeavour.
Background and Related Work
The statistical analysis of texts is an important area of work and is used widely in information retrieval (e.g. web search). It is also a mature area of research in its own right, and has been used in the past for things from author attribution to the ordering of works through time. For example in a letter published in 1882 Augustus De Morgan speculated about using statistical techniques to explore authorship questions around St Paul’s Epistles and the Epistle to the Hebrews [Lea76], while more recently Jockers, Witten, and Criddle used sophisticated statistical techniques to reassess the authorship of the Book of Mormon.
In contrast, the abstract visualization of book texts is not a large or a mature field of study, but there are notable and inspirational examples. The following sections list some of these (more on separate tab)
Our work focuses on the abstract visualization of children’s book series, and in particular the trilogy “His Dark Materials” by Philip Pullman. Pullman’s trilogy is made up of the three novels “The Northern Lights” (called “The Golden Compass” in the USA and in the movie adaptation), “The Subtle Knife”, and “The Amber Spyglass”. We choose this genre partly through personal passion and partly because of the range of potential enthusiastic readers. The best children’s book series (especially before they are completed) are read and discussed by child and adult readers and many of these readers develop their own theories which they share with their friends and with other readers online. Similarly academic interest is piqued and there are conferences and journals dedicated to the study of children’s literature (more on separate tab)
There are several directions we’d like to take this work in now.
We need to take these visualizations out of the research lab and engage both the fans and the academics who are theorising about Pullman’s works. We should engage them with these tools and establish if the tools are useful, how they might be improved, and what other visualizations may be of value to the community.
Throughout this work we took the view that computers were not adept at understanding books, but should just essentially count words and draw the results for people to interpret. However advances in machine learning, and especially toolkits enabling machine learning techniques to be applied quickly to new domains have led us to seek to apply Infer.Net to the analysis phase of the visualization.
Inevitably building visualizations leads to 1,001 other ideas as to how the data may be visualized. We would like to add the ability to pivot (e.g. for one flowers bud to open another flower side-by-side). We would like to add animations so that the dynamic movement between visualizations or as a visualization is formed is part of the semantics of the visualization itself.
The visualizations we made are not available for public use – either online or through downloading. This is partly because we have not spent time looking at the rights implications and partly because we have not engineered the code to the quality level required for public use. It would be great to get this to a level where people can try the visualizations we built for themselves without us present.
It would be interesting to apply this work to other children’s book series, to see if the characteristic patterns revealed in the visualizations were different from author to author. We might also move from a reader’s perspective to a learner’s perspective and choose books which often appear on high-school syllabuses. But most intriguing would be to build visualizations that contrast the content and style of different author’s work.
- [Bec07] Linda Becker, 2007 “In Translation” http://lindabecker.net/in-translation/
- [Dan05] Anh Dang, 2005 “Gospel Spectrum” http://thirteensquares.com/gospelspectrum/
- [Har08] Chris Harrison, 2008 “Visualizing the Bible” http://www.chrisharrison.net/projects/bibleviz/
- [JWC08] Jockers, Witten, and Criddle 2008 “Reassessing authorship of the Book of Mormon using delta and nearest shrunken centroid classification” in “Literary and Linguistic Computing” http://llc.oxfordjournals.org/cgi/content/abstract/23/4/465
- [Lar14] Clarence Larking 1914 – 1918 “Dispensational Charts” http://www.preservedwords.com/charts.htm
- [Lea76] Peter Lea, “The Style is the Man”, unpublished lecture notes and slides, University of York, 1976
- [Pal02] W Bradford Paley, 2002 “TextArc” http://textarc.org/
- [Pos06] Stephanie Posavec, 2006 “Writing Without Words” http://www.itsbeenreal.co.uk/index.php?/wwwords//about-this-project/
- [PRYAKSCL06] Plaisant, Rose, Yu, Auvil, Kirschenbaum, Nell Smith, Clement, and Lord 2006 “Exploring erotics in Emily Dickinson’s correspondence with text mining and visual interfaces” http://portal.acm.org/citation.cfm?id=1141753.1141781
- [Sha08] Ebany Spencer, 2008 “Romancing Dimensions” http://www.ebanyshae.com/page11.htm
- [SK07] Philipp Steinweber and Andreas Koller, 2007 “Similar Diversity” http://similardiversity.net/project/
- [Wal08] Tim Walter, 2008 “textour” http://www.timwalter.de/portfolio/textour/
- [Wat02] Martin Wattenburg, 2002 “Arc Diagrams: Visualizing Structure in Strings” http://portal.acm.org/citation.cfm?id=857733 http://www.research.ibm.com/visual/papers/arc-diagrams.pdf
- [WV08] Martin Wattenberg and Fernanda B. Viégas,2008 “The Word Tree: An Interactive Visual Concordance” http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4658133 http://www.research.ibm.com/visual/papers/wordtree_final2.pdf
This work was done as a collaboration between Linda Becker and Tim Regan during Linda’s internship at Microsoft Research’s Cambridge Lab in the Summer of 2008. The work would not have been possible without the generous, thought provoking, and supportive help of Pullman’s publishers especially Marion Lloyd and Claire Tagg at Scholastic, Pullman’s agent Caradoc King, and Philip Pullman himself.