Measuring User Motivation from Server Log Files
Rodney Fuller 1, 2
Internet Business Solutions
Bellcore
RRC 1B-180, 444 Hoes Lane
Piscataway, NJ, 08854 USA
fuller@ctt.bellcore.com
Johannes J. de Graaff 2
Department of Information Systems
Delft University of Technology
Julianalaan 132
NL-2628 BL Delft, The Netherlands
J.J.deGraaff@TWI.TUDelft.NL
Abstract
Estimating user interest and motivation by just counting page requests from a World Wide Web server log
(or "hits") provides a distorted metric of user activity. Some of the reasons why this metric is unreliable
are that the path dependent nature of hyperlink usability treats index and navigational aid pages as equal
to the goal, because differenes in web browsers can determine how effectively users can percieve content
and navigational alternatives, and because the poorly designed structure and content of the documents
themselves can inhibit users from finding what they are looking for. This paper proposes that measures of
how much time users spend looking at a page are better estimates of user interest than page hits, providing simple human factors principles have been applied. An extended example of how this method might be used to collect and analyze data is also included. The types of decisions that can be made by authors and system administrators based on a time-based metric of user interest is summarized.
Introduction
As web servers become more numerous and the content they offer
becomes more homogeneous, the need for a means of monitoring user
interest is more apparent. Nielsen
(1995) argues
for the development of "an economic model of information use, by which
I refer not to payment to the information providers ..., but to optimizing
the use of resources." Measuring user interest improves the
quality and delivery of information services to the end user by providing
tools for the system administrator and author to determine the
value of the data they are serving. Currently, the most common method of
monitoring user interest is to count the number of page accesses. There are numerous problems with using metrics like the number of
access requests made to a web server as the indicator of user interest.
One of these problems is that in a non-indexed hyperspace
users are forced to follow paths that they or others have previously
forged--with each jump being recorded as a request from some server.
To assume that "hits" reflect user interest is to assume that users are
as interested in not finding the information they need (the "misses," and
"false alarms") as they are in finding what they are looking for. For example,
if an author has constructed a site in such a way that one must go through
documents A, B, C, D, and E to get to F then the log will show just as many
"hits" on pages A thru E as there are on F--even if the user only wanted to
see page F. The second problem with
using hits as a metric of user interest is that any inequality in a user's
ability to access content--because of network bottlenecks, user terminated
transfer, excessive delays in downloading, and server errors--will bias the resulting record of user activity. The third reason
why "hits" should not be used as a measure of user interest is
that the users might be constrained in their ability to view content because
the author failed to consider the human factors related to navigation
and ease of understanding in the design of the documents.
The advantage gained by being able to measure user interest from web server log files is that both administrators and authors can use such metrics to allocate scarce resources related to the value of the information they serve or author. The ability to monitor user interest also facilitates the commercialization of the web by allowing the owners of server logs to collect information of strategic importance to the products and services they offer. This paper discusses how one can use web server log
files to monitor the time spent by users on pages, and how, if one
constructs the content of those pages following simple human factors principles,
this measure can provide a more valid measure of user interest than simply counting the number of page "hits."
Why Counting Page "Hits" is Not an Adequate Measure
A simple count of the number of web browsers requesting a data transfer
from a web server is not enough to explain the information browser's
interest because it does not account for
the user's ability to access to information, how effectively the information is organized and structured for comprehension,
and the appropriateness of the information to the user. It is not possible to use unobtrusive
measures to determine the appropriateness of the information
--one must survey users to determine if they find the information useful.
But if authors and system administrators can log a user's access to, and the design and structre of, web pages then one can
infer that the time spent by a user viewing information is an indicator of
that user's interest.
Two examples illustrate the problems that using page hits cause:
In the first example, a poorly designed URL and a limited announcement
led to a less-than-expected reaction to a web broadcasting event.
This example shows two key principles: that "positioning" web
resources in terms of both cognitive (an easily remembered URL) and virtual
location (the ability to have others interested in the same topics refer to
the site) can influence the number of people accessing the site. This
variability in access invalidates the use of page "hits" as the sole metric
of user interest because one can not assume that all the "traffic" running
through a site is equally motivated nor that all motivated users can find
the site.
In June of 1995 the National Science Foundation and the
Los Alamos National Laboratory sponsored a conference intended to set the
social research agenda for the National Information Initiative (NII). The
proceedings of all the sessions were
summarized, and broadcast on the World Wide Web by a team of NSF student fellows. People
not at the conference were able to visit the conference web site within an hour after any presentation and review what was said and comment on it. The
announcement of this effort was sent by the conference organizers via email to
distribution lists that consisted mainly of the conference participants, and the URL
for this information was http://info-server.lanl.gov:52271/usr/u096272/SFC/sfchome.html.
Compared to the effort put into placing the content up on the web the response from the
web community was disappointing. Two of the reasons why there was a lack of
interest by the web community in this conference were that the announcement was limited and the URL was impossible to remember without reference to a file or adding it to a
hotlist. This year the conference URL (as well as the summary of last year's conference) can be found at http://www.lanl.gov/SFC/--a much more meaningful and memorable URL.
The second example that illustrates how counting page hits is not a
measure of user interest can be seen in the records of visits maintained by
keepers of index pages. What does it mean when these sites report that their
index pages have received many times the number of hits compared to the
content to which the index refers? Are web users more motivated to click on
links than they are to browse content? If one were to blindly accept requests
from a server as the measure of user interest then one must also accept that
internet directories like YAHOO, which have no content other than pointers to
other sites, are much more important to users than the content they point
to because they receive more visits. The same logic would conclude that the
"Yellow Pages" (a topical guide of business and services that own a phone)
is more important to the phone user than the product or service they seek.
Obviously, one must accept that counting page hits is not enough, and find
other metrics for determing user interest.
One such metric is the amount of time that users spend looking at a
page. In developmental psychology similar techniques have been used to
measure how inarticulate subjects (infants) respond to new stimuli--with that
stimuli that they look at the longest being judged the most interesting (Coren, Ward, &
Enns, 1994). In this study we will investigate if the amount
of time spent by users viewing a web page is a better metric of user interest
than simply counting page hits.
The hypothesis is that if
the page design of the web pages you are monitoring are consistent
then, after controlling for the technical issues, any changes in the
amount of time that people spend looking at a page represents a
measure of a user's motivation toward the content of that page. The
key question that needs to be answered before this hypothesis can be tested
is: What is meant by consistent document design?
Consistent Document Design
The purpose of using human factors principles in the design of
documents is to minimize the amount of time that people search for the
information they need. One example is the design of many utility
bills (Nielsen 1993)--where the amount owed is usually on the top page,
printed in the boldest and largest type on the
return receipt. The details regarding how that number was
calculated are usually hidden on the subsequent pages.
The design of the bill allows the customer to see what
they need to do with the document and yet allows further investigation
(see also Norman 1988, and Tufte 1983). The same
principles can be applied to the design of web pages to minimize the amount
of time that users spend looking for the information they need.
The key human factors issues that must be considered in the design of
a web site are: navigation within a document, navigation between documents,
ease of memorability, and ease of comprehension. The following are examples
of how we designed documents that consider these four factors in a "virtual
conference," created to give student volunteers an idea of the types of
activities that occur, and what would be expected of them as volunteers.
This virtual conference is an ongoing research project of the authors
and is supported by the special interest group on computer-human
interaction (SIGCHI) of the Association of Computing Machinery (or ACM).
The objective of creating a virtual conference was to let students get an idea
of the types of activities
that occur at a SIGCHI sponsored event and determine if they wanted to
participate or attend.
Giving users a sense of the alternatives available to them is the
focus of designing effective navigation guides within a document. The content
in our virtual conference corresponds to the spatial, temporal, and topical
structure of the real event--the annual ACM Human Factors and Computers (CHI)
conference. One page was created for each activity or event that occurs at
the conference, and indexes were created that allowed users to
navigate metaphorically in this space via a spatial map, a time schedule, and an interpersonal relationship. In addition, each page
was structured following the same content template--menus at the top, a
summary of the event, more detailed information, images, and navigation tools
to other pages. This design lets the user determine interest in
the topic, go directly to their preferred media for browsing the event, or
change to another topic with at most two links. The important issues to
remember include listing the subheadings within the page, allowing for easy
navigation within the page, being consistent in the placement of tools and
information, and allowing users browse information while downloading
occurs.

Figure 1. The spatial map navigation tool.
Assisting users in navigating between documents involves giving the
user a sense of where they are in the document hyperspace. The most
important aspect of this is to use common metaphors that give feedback to the
user regarding where they have been and what else is available. We created
three metaphoric guides to the "virtual conference":
A spatial map (see Figure 1), a temporal schedule (see Figure 2), and interpersonal guides (see Figure 3). Users could choose which metaphor they preferred
to browse the conference with--and we monitored the differences in performance
on each (measured as the time needed to download navigation guides and go to
another page). It was assumed that one of these guides would allow users
quicker access to the underlying content, but this was not the case. More
first time users of the virtual conference (those people with unique domain
name server addresses using the guides for the first time) preferred the map
(N=253, average time on page=22.87 seconds, standard deviation=17.53 seconds)
compared to both the schedule (N=110, mean=23.68 s, SD=17.98 s) and the
interpersonal metaphor (N=136, mean=21.65 s, SD=20.73 s) but users were not
significantly faster in using one metaphor over the other.

Figure 2. The time schedule navigation tool.

Figure 3. The interpersonal guide navigation tool.
Memorability includes using page names and URL's that make
the resource easy to find. The ability to integrate the "virtual conference"
onto the web site for the professional society that sponsors the annual CHI
conference was an important aspect of capturing
users that already have some interest in the content. Because of the path
dependent nature of content access (content linked to other content) users of
the web must encounter content where they expect it--or they will not find
it. A good search engine might help this problem--but in many situations
a user must know what they
want before they can find it with a search engine. This dilemma--that one
must know what they want before they know that they want it--limits the user
of the search engine to finding only what they expect and does not take
advantage of the semantic architecture of the World Wide Web. By integrating
our "virtual conference" into the flow of users seeking information on the field
of human-computer interaction, everyone who remembered that www.acm.com/sigchi/
was where information regarding human factors and computing is located could
also browse our web site. We also selected a name of our virtual conference
that parodied the real conference--CHI'00 (KH'i naught).
The last factor that must be considered is ease of comprehension. The
best examples of how neglecting this factor can inhibit user performance are
those pages where these human factors of document design are ignored. For
example, some of the "halls of shame" on the web. These pages use background and
text combinations that don't contrast enough to make rapid visual
identification possible, image maps that don't give the user feedback
regarding where they have been, multiple font sizes and styles which limit the
speed at which users can read text, poorly designed icons or instructions, and
poorly worded text. The users of these pages can spend a significant
amount of time trying to understand which alternatives they are being presented with rather than deciding if they have found the content they were searching
for.
The key user activities that need to be supported by consistency in
document design are those related to comprehension and navigation. The
document designer and system administrator must determine which user
activities relate to their content because the specific activities that need
to be supported might vary depending on the interface, content, or system.
If each and every page is a new adventure for the user, then some of
the time they spend browsing each page will be wasted trying to
determine where the author has hidden the information. A
page designed for clarity and consistency, on the other hand, will minimize
this problem.
The time spent by users on a page can be thought of as the
aggregate of the time spent accessing the page, the time spent finding
content and the time spent determining how to exit the page to the next
desired activity. If system administrators and the authors of documents are
consistent in designing for ease of comprehension and navigation, and the
technical problems of network variability and caching are resolved, then one
can infer that the amount of time that users spend viewing a page is an
indicator of their interest. Thus, if one can control for network variability
and the cache problem then the amount of time spent looking at the page is a
rough measure of the users motivation to view the content (Kamba, Bharat, & Albers 1995; Morita and Shinoda 1994).
Technical Problems:
When measuring user interest for particular pages both the
variability of network performance and the use of a local cache can
disrupt the measurements of time spent on a page--as measured by server
log files.
Network variability
The transmission of files over a network from a remote site depends
on the capabilities of the intervening network. Without an estimate
of the network speed it is impossible to know how quickly files are being
transferred at the clients request. At this time there is no one place that
one can go to benchmark speed of "the internet" at various times of the day--
and if the amount of time users spend on a page is to be a metric of their
interest then some means of cheaply and easily estimating network variability
is needed. One way to estimate network load is by breaking the page into
smaller segments and using these segments to benchmark the file transfer
process--for example, by having images associated with every page you wish to
monitor you can determine how long it takes to transfer the text and then the
image. But images cause problems because they take a long time to transfer
and this increases user frustration. This time can be
disproportionately large compared to the time spent looking at the information
on the page, particularly if the user is located behind a slow modem link.
And while some browsers allow the user to view the text of a page before the
images finish transferring this feature is not available on all browsers.
Before one can estimate network variability one must resolve the image
transfer problem.
By using small "thumbnail" images that are roughly equivalent to
the size of the document that is being transferred as abstracts of the larger
images one can estimate the network speed between server and client. If the
size of the page and the size of the thumbnail image are nearly equivalent then one can measure network latency by
taking the time stamp of the retrieval of the page, and subtracting from it
the time stamp of the first image on the page. This allows one to make a fair
comparison between different people accessing a server, adjusting for
individual network capabilities.
Use of cache systems
The use of cache systems, either by WWW clients, or by proxy
servers, makes it much harder to account for a users time. A cache will
store the pages, and hand them to clients who request them without leaving a
trace in the server logfiles. While this improves user efficiency, it does
pose problems for measuring the time spent looking at pages.
One way to deal with this problem is to make all pages
non-cacheable, for instance by telling the client the pages were
modified "now". However, even by using these techniques it isn't
guaranteed that the client will re-request the pages. Kamba, Bharat, and
Albers (1995) have suggested using extensible applets to monitor the number
of user micro-requests (like clicks on a scroll bar) as a means of monitoring
where users spend time within a document--thus circumventing the cache problem.
Currently, this problem has no clear resolution.
In our analysis we have explicitly taken into account the fact
that people might look at pages still in their cache and we only considered
users (log file entries with unique domain name server addresses) requesting
pages for the first time. In
this way we know that that page was not accessed from a proxy server or a
local disk. A second problem is that users might pause in their navigation
of the site to visit previously accessed pages. Though we could
not measure the time spent looking at pages in the cache, we can
assume that some users did visit their cache. If we assume that any time that
a user spent looking at their cache would delay their request for a new page
we can correct for viewing a cache by imposing an upper limit on time allowed
in viewing pages. By imposing an upper limit of three standard deviations
above the average time spent looking at all pages that are not navigation
guides we have controlled for those people who spend significant time visiting
pages in their cache before returning to request a new page from the server.
Method:
To insure that the users could not leave the virtual conference and return
later no links outside of the site were included. This
prohibited users from browsing outside of the conference where our
server log could not monitor activity. We also standardized the
metaphors of interaction used by the browsers: where the specific
topics of each page were listed at the top of each page, and numerous
links to the top of the page were interspersed within the page (see Figure 4). A link
to the main navigation tools (the map, the schedule, and the interpersonal
guides) were also listed directly under the page menu at the top of the
page. These metaphors were also made explicit at the bottom of each page (see Figure 5).

Figure 4. An example of the menu structures at the top of a page.

Figure 5. An example of the navigation choices at the bottom of every page.
The log files from both of the development sites were obtained for the
six months following the formal public announcement (April 11, 1995).
Logs at one of the development sites (www.cgs.edu) were only kept for three months
following this announcement but approximately the same number of page requests were logged by each server.
The log files were combined and sorted by date and time of access.
All instances of first access by each unique domain name server (DNS) address
were marked. The log file was then sorted by domain name server address, and
the time difference in seconds was calculated from all sequential html page
requests by users with the same DNS address. There are three issues that can
confound the above calculation of time spent viewing a page: these are, first, the
cache problem, the network efficiency problem and the neglect of human factors.
The cache problem is that most browsers allow the user to cache recently
visited pages in local memory and thus do one of two things: use either the
"previous" or "back" button to reload a previous page from local memory (which
does not generate a server request) or use the menu that lists the recent
history of page visits. These problems were by only evaluating users looking at a page for the first
time, and by cutting off all log entries that showed that users spent more
than three standard deviations away from the average time spent on a page.
The second issue is network efficiency. The time to download the same
page depends on things like bandwidth, server load, both the speed of
local machine and network, and intervening network load--and some means of
estimating the efficiency must be obtained for each server request before
one can make a valid estimate of the time spent actually browsing the content
of a page. The final issue is that of human factors associated with browsing
information. Since network efficiency depends on issues like local and
global network demand it is impossible to get an accurate estimate of each user's
capability at the time of each request. One way to easily estimate this
capability is to send any given request in segments and monitor the time it
takes each of the segments to arrive. This can be done by having
the first image downloaded after the html page be a small thumbnail image that
is approximately the same size as the html page. Using this method one can
easily calculate the network efficiency at the time of each server request
(regardless of where the user is located) by calculating the number of seconds
between requests for html pages as the amount of time spent on that page, and
subtract the number of seconds between the html page and the first image file
as the adjustment for network efficiency.
In the CHI'00 project we estimated
network efficiency using the following method: Each HTML page has at least
one associated image with it, and an estimate of the network speed during
that specific server request was generated by calculating the time needed to
download the image after the page had been requested. All inline graphics
used in the estimate of network speed are thumbnails of approximately the
same size (mean=1.7K, sd=.44 K) as the HTML pages (mean=6.9K, sd=2.6K).
It should be noted that even pages that had no images associated with them
had a dummy image added. The problem of scaling down the images (mean=55.3K, sd=27.7K) to high quality thumbnails of approximately the same size as
the HTML pages was solved using software provided by Bellcore for the Peter F Drucker digital library project.
The third group of issues that must be accounted for before one can use
time spent on a page by a web browser as a metric of interest are those
related to the human factors of visualizing information. By involving human
factors into the design of web page content you try and minimize the amount of
unproductive time that a user will spend browsing, searching, or navigating
the content. Some of the issues involved are making sure that backgrounds
contrast with the text in such a way that it is clearly visible, that the font
is large enough to be readable, that the pages are laid out in such a way
that users can easily identify and navigate between alternatives, that the navigation metaphors
are easily understood, and users can see where they have been.
Grouping Log Entries
From the users' DNS address we obtained a simple demographic estimate
of user nationality (e.g. all DNS names that end with .edu were considered to
be U.S. academic sites, and log entries that ended in .nl were thought to
originate from users in the Netherlands). Information regarding the
demographics of access and the pages visited were dummy coded into variables
representing users' locality (North American or European, we didn't get enough
visitors from Asia to permit analysis) and, where possible, type of work (i.e.,
all .com addresses into U.S. "industry", .edu into "academics", and
all .gov, .mil, .net, and, .org sites into "others"). Pages were dummy coded
to their corresponding "track" in the conference--papers, panels, plenaries,
and short papers were coded as "technical program," while design briefings,
doctoral consortium, organizational overviews, tutorials, videos, and
workshops were coded as "applied program." Page visits to the demos,
exhibits, internet, and interactive experience were coded as "hands on
program." One of the reasons why this was done was to test the assumption
that people from different locations or careers would view the conference
experience differently--as is widely suspected by conference organizers.
Results:
The variables were analyzed in a linear, multiple regression analysis
because the main metric of time spent viewing a page was thought to be the
result of a linear combination of performance issues (ability to navigate,
comprehend, and access content) and user motivation (which could be linear or
non-linear). Regression analysis offers the best tools for analyzing a
complex environment while controlling for underlying factors (Cohen and Cohen
1983). This method of analysis allows the researchers to incrementally add a
variable, or set of variables, to determine what is gained by adding that
dimension of the users behavior. The resulting multiple regression
coefficient, R, is a measure of how correlated the variables in the equation
are with the dependent variable--and can vary from a perfect correlation of 1
(or -1) to a low of 0 for no correlation. An estimate of the amount of
variability that this equation explains (or how "useful" is it to know this
inter-relationship exists) can be obtained by squaring the regression
coefficient (or R squared). Because multiple regression offers all of these
perspectives, while still controlling for statistical error, it is preferred
for this type of analysis.
A total of 1,732 log entries were analyzed, using time spent on a page
(in seconds) as a dependent variable (see Table 1 for descriptive statistics).
Separate regressions were run on the effect of job and location on how time
was spent on the various areas of the conference. The simple correlations for
all the variables are presented in Table 2. As shown in Table 2, the
distribution of how users from "industry" spend their time in the "virtual
conference" was significantly different than expected (r= .07, p=.002),
but that "academics" spend their time in an opposite and equally unexpected
way (r= -.065, p=.004). Table 2 also shows that the time spent
by users in the "technical program" contributed significantly toward higher
visit times (r= .129, p=.001) than the "hands on" aspects of the
conference (r= .036, p=.069), and in significantly opposite ways
compared to the "applied" areas (r= -.040, p=.048). Table 2
also shows that no instances of multi-colineraity occur in these variables.
Location Average time N SD
---------------------------------------------
North America 33.96 935 24.81
Industry 36.36 417 25.61
Academia 30.59 452 23.09
Other 32.95 60 21.31
Europe 31.27 587 24.67
Program:
---------------------------------------------
Technical 41.31 237 23.36
Applied 20.91 11 13.64
Hands On 36.59 115 23.61
--------------------------------------------
Table 1: Descriptive statistics for variables used in the analysis.
time NorthA Europe TechPrg Applied
----------------------------------------------------------------
North America .030
Europe -.058**
Technical Program .129** -.013 -.002
Applied -.040* -.014 .020 -.032
HandsOn .036 .037 -.039 -.106** -.021
----------------------------------------------------------------
* p less than or equal to .05, ** p less than or equal to .01
Table 2. Simple correlations between variables used in analysis.
A regression of time spent by location showed that while being from
North America did not significantly change how users allocated their time,
visitors from Europe did spend significantly less time in CHI'00 than other
users (R=.063, F change=5.3, df=2, 1729, p=.021). While this difference
is significant it is probably not important, because the r square value
for this difference indicates that knowing that a user is from Europe instead
of North America explains .019 % of the variability that exists in how people
spent their time. Adding the "technical program" to this equation shows a
highly significant difference (R=.144, F change=29.4, df=3, 1728, p=.0001),
as does the addition of the "hands on" areas of the conference (R=.155, F
change = 3.87, df=5, 1726, p=.049). What this analysis shows is that
the main differences in how people spend time at a virtual conference on human
factors and computers is based on the content of the conference, and not on
the nationality of the people attending.
A separate analysis shows that different jobs have a
significant effect on how one spends time at a virtual conference. Users
with DNS addresses that end in ".com" looked at pages significantly longer
than other types of users (R=.070, F change=8.54, df=1, 1730, p=.0035)
while users in academics and other careers did not show significant differences
in how they spent their time. This increased interest shown by "industry"
visitors of our virtual conference on human factors corresponds with the
increasing number of actual conference visitors surveyed at the 1994
conference (Schofield, Lynch, Tauber, Curtis, Fuller & Roberts 1995).
Conclusions:
To be effective in a knowledge and information based society,
individuals need tools that allow them to collect, manipulate, and
distribute the products of their own or others work. Included in this
lifecycle of information use is the need to gain access to a large
variety of files, search diverse resources, collect and summarize the
information found, and finally redistribute this information. The
problem for authors and maintainers of such distributed resources is
that there is no way to measure the utility of information in such a
process: How do you demonstrate that the information you have placed
on the web is being accessed in a significant way? How do you make
decisions about allocating scarce resources without some metric of value
to the user? The current method used for determining interest is to log
the number of "hits" that that page has received. This is inadequate because
the general browser will log "hits" not only on the page of interest but
also on every other page the user visited in getting there. What is
needed is a path independent measure of user interest. A time based
measure of interest would be such a measure.
When a user accesses a page in the world wide web they could
be doing one of two things: searching for the information they need or
processing the information they need. Both of these activities take
up user time, and any measure of user interest based on time will
confound these two activities. One way to minimize this confusion is
to adopt a consistent style of document design. If your users know
how to navigate your pages to find what they need quickly then using a
measure of the time spent on various pages will reflect the
browser's search for content. Other methods of minimizing the time
that users spend looking for the content that they need are to provide
an information overview (or content summary), to utilize human factors
principles to improve user comprehension of the information accessed,
to allow access to search engines that offer more than keyword search,
and to provide easily understood navigational guides of the content.
The above experiment describes a method for using such measures to
confirm behavior trends in a distributed and diverse community. By using
measures of the time spent on a page by users we were able to monitor those
areas of the conference that our users found most interesting and predict
how they would spend their time. This same method can be used to predict
how motivated users are toward other content--if the authors and system
administrators utilize consistent document design, provide navigation
guides to their users, and construct their pages so that an estimate of
network variability can be extracted from the log. If this is done the
system administrator can use this measure to predict which content their
users find important and helpful.
How Can Measures of User Motivation Help the Web Administrator
and Author?
The most common way to measure interest in the W3
is to look at the number of hits that one's pages receive. This tally
is inadequate as a metric for determining how to allocate attention or
resources based on the topics that users find most interesting or important.
By using a time based metric of user
interest you can better determine which content users find most helpful and
frustrating, and thus make better decisions about the allocation of scarce
resources. Some of the situations where a measure of user motivation can
be used are:
- Within a large FAQ file which topic do users spend the most time
looking at? Should this section be revised because it is difficult? For
example, do users spend less time on that section than one of equivalent
design and length? Is that because they are frustrated, or are they finding
what the need faster?
- In running an on-line help service where an analysis of what features and functions the users find most frustrating would assist in the redesign of that service, a measure of motivation is
more informative than a list of the most frequently accessed
files--especially when these files are likely to be the indexes that
help the users find the help they need.
- When serving pages that invite users to do something a
measure of time spent looking at those pages is a way to monitor the
"seductiveness" or attractability of such activities (Kamba, Bharat & Albers 1995). For example if
you are advertising jobs via a web server and you find that users are
spending less time viewing some pages than a test group you can infer
that there is something about this job description that is less
desirable than the others.
Notes
1. Partial support for this project was given by the Peter F. Drucker Executive Management Department, Claremont Graduate School, Claremont, CA USA and the Centrum voor Wiskunde en Informatica, Amsterdam, the Netherlands.
2. Partial support for this project was given to both authors by the Association of Computing Machinery (ACM), and the Special Interest Group in Human-Computer Interaction (SIGCHI).
3. Special thanks to Bob Allen, Richard Miller, Catherine Hanson, and Bob Root (all of
Bellcore) for their reviews. Special thanks to Steven Pemberton, Astrid
Kerssens, Eddy Boeve and Guido van Rossum (of CWI and General Design) for their feedback on an early version of this paper.
Refrences
Cohen, J. & Cohen, P. (1983). Applied Multiple Regression/Correlation
Analysis for the Behavioral Sciences. Lawrence Earlbaum and Associates,
Hillsdale, New Jersey.
Coren, J., Ward, & Enns, (1994). Sensation and Perception, (fourth edition). John Wiley and Sons: San Diego, California.
Kamba, T., Bharat, K. & Albers, M. C. (1995). The krakatoa chronicle: an interactive personalized newspaper on the web. Proceedings of the Fourth International World Wide Web Conference, Boston, Massachusetts. p. 159-170.
Morita, M. and Shinoda, Y. (1994). Information filtering based on user
behavior analysis and best match text retrieval, Proceedings of the SIGIR'94.
Nielsen, J. (1995). Alertbox Report. Sun Microsystems internal document. July, 1995.
URL=http://www.sun.com/950701/columns/alertbox/.
Nielsen, J. (1993). Usability Engineering. London: John Wiley and Sons.
Norman, D.A. (1988). The Psychology of Everyday Things. New York: Basic Books.
Schofield, K.M., Lynch, G., Tauber, M., Curtis, B., Fuller, R., Roberts, T. (1995). CHI conference user feedback session. Proceedings of the 1995 Conference on Human Factors and Computers. ACM Press.
URL=http://www.acm.org/sigchi/chi95/Electronic/documnts/panels/kms_bdy.htm.
Tufte, E.R. (1983). The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press.
|