Correspondence analysis and ancillary data analysis and visualization methods, such as hierarchical clustering, constitute a powerful and versatile platform for data mining and machine learning. As we describe, semantic analysis of textual data is supported well in this framework, and also analysis of change or anomaly. We begin by taking the prescripts and principles discussed by Robert McKee in his book Story: Substance, Structure, Style, and the Principles of Screenwriting, Methuen, 1999. For McKee, filmscript is the “sensory surface” of the underlying semantics and we show how McKee’s work can be used in the context of the Casablanca movie (and also we refer to the CSI television series). Turning to how this work can be used for collaborative and interactive work, we describe how this data mining platform was used to support collective novel writing in an English creative writing class. As an example of use in social media, we discuss the use of analysis of semantics and change in blog rolls.
Finally, for analysis of semantics and change in wider, social environments, we apply the same algorithms to data from 1998-2004 on the terrible Colombian civil violence.
F. Murtagh, Correspondence Analysis and Data Coding with R and Java, Chapman and Hall/CRC Press, 2005.
F. Murtagh, A. Ganz and J. Reddington, “New methods of analysis of narrative and semantics in support of interactivity”, Entertainment Computing, 2, 115-121, 2011.
F. Murtagh, M. Spagat and J.A. Restrepo, “Ultrametric wavelet regression of multivariate time series: Application to Colombian conflict analysis”, IEEE Transactions on Systems, Man, and Cybernetics–Part A: Systems and Humans, 41, 254-263, 2011. F. Murtagh, “The Correspondence Analysis platform for uncovering deep structure in data and information”, Sixth Boole Lecture, Computer Journal, 53 (3), 304-315, 2010.
F. Murtagh, A. Ganz, S. McKie, J. Mothe and K. Englmeier, “Tag clouds for displaying semantics: The case of filmscripts”, Information Visualization Journal, 9, 253-262, 2010. F. Murtagh, A. Ganz and S. McKie, “The structure of narrative: the case of film scripts”, Pattern Recognition, 42, 302-312, 2009. (See discussion in Z. Merali, “Here’s looking at you, kid. Software promises to identify blockbuster scripts.”, Nature, 453, p. 708, 4 June 2008.)