decisions, decisions, decisions: standards for evaluating international statistics resources
TRANSCRIPT
Journal of Government Information 29 (2002) 365–370
International Information Update
Decisions, decisions, decisions: standards for evaluating
international statistics resources
Amy West*
Government Publications Library, 10 Wilson Library, 309-19th Avenue South, University of Minnesota,
Minneapolis, MN 55455-0414, USA
Available online 29 December 2003
1. Introduction
No institution ever has as much money as it could use for collection development.
Librarians routinely evaluate new print publications and make purchasing decisions as to
what best serves their clientele. However, because of ever-evolving software, new
products, and new delivery methods, librarians are less comfortable evaluating electronic
statistical publications. Increasing demand for these products and limited resources make
this evaluation process critical.
The bulk of this article will focus on resources that are not freely available
because the choices are more important and often mean weighing one product against
another. There is so much variation in construction and delivery of international
statistical resources that articulating a single set of standards by which to judge them
would be inappropriate. Instead, potential users should apply a variable list of
standards, some of which apply to all resources and some of which apply to only
a few. This article describes such a combination of standards, noting resources that
exemplify them.
Briefly, the standards groups are: (1) those independent of the resource; (2) those
dependent on the resource itself; (3) those dependent on the resource in context of other
related resources; and (4) those dependent on the resource in context of the customer’s
budget. At the end is a select list of resources annotated with their best features and
biggest shortcomings.
1352-0237/$ – see front matter D 2003 Elsevier Inc. All rights reserved.
doi:10.1016/j.jgi.2003.11.005
* E-mail address: [email protected].
A. West / Journal of Government Information 29 (2002) 365–370366
2. Group 1 standards
2.1. Documentation
There should be two separate types of documentation for any resource. The first is
documentation of the interface software, with clear installation instructions (for resources
delivered via tangible media), a help file and contact information should problems in
installation or operation occur. Generally, most resources have enough of this type of
documentation to suffice. The second type of documentation covers the data itself. It should
indicate where the numbers came from, if they were modified and how so, what calculations
are then used to generate the statistics, and explanations of all symbols used in tables.
World Development Indicators (WDI) has pretty good documentation. On the CD there are
separate files for the index of indicators, acronyms, and abbreviations, a bibliography, their
groups of economies, the primary data documentation, and their statistical methods. The
documentation for the UNSTATS database takes advantage of hyperlinks to link individual
definitions of indicators with their original source, so that users can view all the indicators
contributed by a given source or all the sources for a given indicator.
2.2. Is it compatible with the local computing environment?
When a resource is delivered via a tangible medium, then it will have to work with the
local operating system while a security program is running. This isn’t too much of a problem
these days. However, there have been products designed to interface with the hard drive of the
computer in such a way as to make the hard drive accessible by users. On a library’s
networked public workstation this would conflict with security standards.
3. Group 2 standards
Buyers may choose from so many resources with so many different uses, target audiences,
and methods of delivery, that it would be pointless not to use standards that are specific to each
resource. Broadly, thismeans testing them to see if they live up to their advertising. For example,
if a producer says the benefit of a given product is that said product will provide remote access
via Internet delivery, then the test of the product should be ‘‘Will this be really usable by a end
user using a standard dial-up connection to the Internet?’’ Even if the end-user has a 56kmodem
and the producer has the biggest, fastest server in the world, a standard connection travels on
phone lines and phone lines transmit at 28.8bps. In essence, this means one should ask how
many graphics are used, how large they are, whether there is behind the scenes programming
that is invoked every time a page is loaded, and how many clicks it takes to get to the statistics.
UNSTATS is an excellent example of how to provide true remote access. Its interface is
very simple and has minimal graphics. The interface pages, as delivered to the end-user, are
straightforward HTML and involve no scripting apart from what may be used to initially
generate and load the page. As a result, it is very fast.
A. West / Journal of Government Information 29 (2002) 365–370 367
Resources are also routinely called ‘‘easy to use.’’ ‘‘Easy’’ is subjective, but some
illustrative examples are available. SourceOECD makes good use of web design standards.
The link to the Statistics section is easily spotted on the home page. When clicked, the end-
user’s eye will be drawn to the menu down the left which contains links to broad subjects, e.g.,
Agriculture. Given that end-users typically think in broad terms, this makes for a good match.
The end-user thinks, ‘‘I want stuff on agriculture. Oh, there is agriculture.’’ Then the user
clicks on Agriculture. SourceOECD also uses graphics to help orient the end-user, such as
smiley faces to indicate whether the end user’s institution has access to a given database.
SourceOECD’s implementation of the Beyond 20/20 browser is also well done. Because the
end-user’s operational options are always in view in a menu on the left-hand side of the
screen, it is easy to change variables, time periods, countries, or output options.
If a resource is supposed to allow users to download or save the information they’ve looked
up, take a look at the file formats users can choose from. Microsoft Excel is the most widely
used spreadsheet in the world and there should be a format compatible with it available to the
end-user. One effective instance of this is the WISTAT CD-ROM from the United Nations.
Users can interact with the statistics using Beyond 20/20, but there is a separate directory that
contains Excel formats of all the tables in the database. Users who already know what they
need can go straight to the Excel files, save a copy, and head back to their office to work while
users who do not know what they need can browse with Beyond 20/20.
Ideally, producers would include character delimited ASCII text file formats in case the
end-user’s file is too big for a disk or if the end-user wants to use her file with some software
other than Excel or because the end-user has another need for a nonproprietary file format.
WDI and the World Bank Africa Database use the same software and both offer users the
option of saving as ASCII text, Excel, SAS, and more.
4. Group 3 standards
The third group of standards depend on the resource in context: what about it makes it
worth having: content, querying software and/or data structure?
This standard can be the hardest to judge. Most resources for international statistics start
with the ‘‘same’’ sources, i.e., they start with data gathered by other international inter-
governmental organizations. In the absence of a summary comparing sources, potential
buyers have to go to the resources and try compare on a series by series basis. This is
virtually impossible due to the massive size of most resources, the limited time available to
buyers and, most importantly, structural differences in databases that mask similarities
between sources. The UNSTATS and International Financial Statistics (IFS) databases are
a good example of this.
When UNSTATS was introduced, its producers highlighted in particular its inclusion of
IMF data otherwise only available on the IFS CD. For potential buyers of UNSTATS who
already bought IFS, it was then important to determine the extent of overlap because IFS is
more expensive than UNSTATS. If UNSTATS provided enough data from the IMF, then
buyers might decide to discontinue the IFS subscription.
A. West / Journal of Government Information 29 (2002) 365–370368
The IFS database is a two-dimensional table in which every row represents a ‘‘series’’
defined as a set of statistics for a given country over a period of years. There is a minimum
of 30,000 rows in the IFS database. The number of observations would then be 30,000
times about 50 years plus an unknown number of quarterly and monthly periods, i.e., a
minimum of 1,500,000 observations. The maximum is harder to calculate. Not every
country will have a row for every statistic and IFS treats aggregated groups of countries as
if they were individual nations. Also, there are several odd series names that are probably
typographical errors, but which inflate the number of rows and thereby the number of series
and observations.
On the surface, UNSTATS appears to provide just under 100 series (with the attendant
larger number of observations). That implies that very little of the IFS database is captured by
UNSTATS. However, it turns out that the database structure underlying UNSTATS is multi-
dimensional, not two-dimensional. That means that all of the rows that would belong to, say,
capital account credit, and which would be counted individually in IFS as described above,
are collapsed in UNSTATS. In UNSTATS there will be a series, like capital account credit,
which has multiple dimensions including time and place. Thus, while there is a content
difference between IFS and UNSTATS, it is not as extreme as it might seem nor is it small
enough to justify dropping an IFS subscription.
5. Group 4 standards
Given all of the above, is a resource worth the cost or not? The answer is, of course, ‘‘it
depends.’’ Certainly, any resource that is cheap will get considered and in all honesty will
probably get judged less stringently simply because the financial stakes aren’t as high.
Conversely, any really expensive resource, even it appears to be really, really good, could
be dismissed out of hand.WDI on CD-ROM is very reasonably priced, works well, has lots of
content, and is fairly easy to use. WDI Online, to the extent that it performs as well as the free
Data Query on the World Bank web site, looks to be significantly better. It integrates
documentation, effectively exploits hyperlinks, and does not overdo the graphics. However,
compared with the cost of both the network license for the CD and for a similar web delivered
service such as UNSTATS, the cost is astronomical.
6. Conclusion
In an imperfect world where buyers have limited income, they must critically assess any
resource that provides access to international statistical resources. Some of the standards for
assessment will be applicable across the board, some will be specific to the resource and some
will be specific to the financial state of the buyer.
One test that all the producers of resources discussed above pass with flying colors is
responsiveness to customers. They have each taken critical comments constructively and
moved to address them and it has been appreciated by their users.
Resource name Best feature Biggest shortcoming
Eurostat (web site)
http://europa.eu.int/comm/eurostat/Public/datashop/print-catalogue/EN?catalogue=Eurostat
Unique content Almost none of it is free
FAOSTAT (web and CD) http://apps.fao.org/ Lengthy time series,
unique content
User doesn’t find out web
downloads aren’t free until
after trying to download
Census Bureau International Database (web and downloadable software)
http://www.census.gov/ipc/www/idbnew.html
Lengthy time series
of demographic data
Labeling and descriptions
on web site confusing
Foreign Labor Statistics (web) http://www.bls.gov/fls/home.htm Excellent documentation,
public data query clearly
directs user with numbered
steps
Public Data Query does
have a download option,
but it is not explicitly
described that way and
users could easily end up
doing more work than
necessary
International Financial Statistics (CD)
http://www.imf.org/external/pubs/pubs/dload/pubcat.pdf
Unique content, extremely
timely, lengthy time series,
lots of series, low
maintenance
Interface initially confusing
to users
LABORSTA (web) http://laborsta.ilo.org/ Free, lengthy time series,
includes worker injury and
strike statistics
Interface uses frames which
don’t meet accessibility
standards
SourceOECD (web) http://www.sourceoecd.org/ Provides trade by commodity
by country by year; Beyond
20/20 implementation is
excellent
Too many graphics, too
long to load each page, too
many clicks to get to data,
down too often
Table of ResourcesA.West
/JournalofGovern
mentInform
atio
n29(2002)365–370
369
UNSTATS (web) http://unstats.un.org/unsd/cdb/cdb_help/cdb_quick_start.asp Fast, tells user coverage for
series as a whole and for
Putting a link to the
Advanced Data Selection
370
each country in each series on every screen falsely
implies a context-sensitive
function; user will not
expect to have to start over
from scratch
UN Demographic Yearbook Historical Supplement (CD) 50-year time series in many
formats, including raw data
and sample SPSS data
dictionaries
Overly complex frames
interface that squeezes
target information into a
very small frame
UNESCO Statistics (web site)
http://www.uis.unesco.org/ev_en.php?ID=2867_201&ID2=DO_TOPIC
Freely available, stable, easy
to use, clear directions for
downloading
Limited statistics as
compared with other
sources that draw on
UNESCO data
WISTAT (CD) http://unstats.un.org/unsd/demographic/gender/wistat/ Unique content that’s hard to
come by
Beyond 20/20 software
can be difficult to use on a
public workstation that has
other titles also using
Beyond 20/20
World Bank Africa Database (CD) Unique content that’s hard to
come by, uses the same
software as World
Development Indicators
Not as much documentation
as on World Development
Indicators
World Development Indicators (CD)
http://www.worldbank.org/data/wdi/cdrom/
40 years of a huge number
of series drawn from many
different sources
Software is a little clunky,
initial results display is
confusing
A.West
/JournalofGovern
mentInform
atio
n29(2002)365–370