decisions, decisions, decisions: standards for evaluating international statistics resources

6
International Information Update Decisions, decisions, decisions: standards for evaluating international statistics resources Amy West * Government Publications Library, 10 Wilson Library, 309-19th Avenue South, University of Minnesota, Minneapolis, MN 55455-0414, USA Available online 29 December 2003 1. Introduction No institution ever has as much money as it could use for collection development. Librarians routinely evaluate new print publications and make purchasing decisions as to what best serves their clientele. However, because of ever-evolving software, new products, and new delivery methods, librarians are less comfortable evaluating electronic statistical publications. Increasing demand for these products and limited resources make this evaluation process critical. The bulk of this article will focus on resources that are not freely available because the choices are more important and often mean weighing one product against another. There is so much variation in construction and delivery of international statistical resources that articulating a single set of standards by which to judge them would be inappropriate. Instead, potential users should apply a variable list of standards, some of which apply to all resources and some of which apply to only a few. This article describes such a combination of standards, noting resources that exemplify them. Briefly, the standards groups are: (1) those independent of the resource; (2) those dependent on the resource itself; (3) those dependent on the resource in context of other related resources; and (4) those dependent on the resource in context of the customer’s budget. At the end is a select list of resources annotated with their best features and biggest shortcomings. 1352-0237/$ – see front matter D 2003 Elsevier Inc. All rights reserved. doi:10.1016/j.jgi.2003.11.005 * E-mail address: [email protected]. Journal of Government Information 29 (2002) 365 – 370

Upload: amy-west

Post on 25-Oct-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Journal of Government Information 29 (2002) 365–370

International Information Update

Decisions, decisions, decisions: standards for evaluating

international statistics resources

Amy West*

Government Publications Library, 10 Wilson Library, 309-19th Avenue South, University of Minnesota,

Minneapolis, MN 55455-0414, USA

Available online 29 December 2003

1. Introduction

No institution ever has as much money as it could use for collection development.

Librarians routinely evaluate new print publications and make purchasing decisions as to

what best serves their clientele. However, because of ever-evolving software, new

products, and new delivery methods, librarians are less comfortable evaluating electronic

statistical publications. Increasing demand for these products and limited resources make

this evaluation process critical.

The bulk of this article will focus on resources that are not freely available

because the choices are more important and often mean weighing one product against

another. There is so much variation in construction and delivery of international

statistical resources that articulating a single set of standards by which to judge them

would be inappropriate. Instead, potential users should apply a variable list of

standards, some of which apply to all resources and some of which apply to only

a few. This article describes such a combination of standards, noting resources that

exemplify them.

Briefly, the standards groups are: (1) those independent of the resource; (2) those

dependent on the resource itself; (3) those dependent on the resource in context of other

related resources; and (4) those dependent on the resource in context of the customer’s

budget. At the end is a select list of resources annotated with their best features and

biggest shortcomings.

1352-0237/$ – see front matter D 2003 Elsevier Inc. All rights reserved.

doi:10.1016/j.jgi.2003.11.005

* E-mail address: [email protected].

A. West / Journal of Government Information 29 (2002) 365–370366

2. Group 1 standards

2.1. Documentation

There should be two separate types of documentation for any resource. The first is

documentation of the interface software, with clear installation instructions (for resources

delivered via tangible media), a help file and contact information should problems in

installation or operation occur. Generally, most resources have enough of this type of

documentation to suffice. The second type of documentation covers the data itself. It should

indicate where the numbers came from, if they were modified and how so, what calculations

are then used to generate the statistics, and explanations of all symbols used in tables.

World Development Indicators (WDI) has pretty good documentation. On the CD there are

separate files for the index of indicators, acronyms, and abbreviations, a bibliography, their

groups of economies, the primary data documentation, and their statistical methods. The

documentation for the UNSTATS database takes advantage of hyperlinks to link individual

definitions of indicators with their original source, so that users can view all the indicators

contributed by a given source or all the sources for a given indicator.

2.2. Is it compatible with the local computing environment?

When a resource is delivered via a tangible medium, then it will have to work with the

local operating system while a security program is running. This isn’t too much of a problem

these days. However, there have been products designed to interface with the hard drive of the

computer in such a way as to make the hard drive accessible by users. On a library’s

networked public workstation this would conflict with security standards.

3. Group 2 standards

Buyers may choose from so many resources with so many different uses, target audiences,

and methods of delivery, that it would be pointless not to use standards that are specific to each

resource. Broadly, thismeans testing them to see if they live up to their advertising. For example,

if a producer says the benefit of a given product is that said product will provide remote access

via Internet delivery, then the test of the product should be ‘‘Will this be really usable by a end

user using a standard dial-up connection to the Internet?’’ Even if the end-user has a 56kmodem

and the producer has the biggest, fastest server in the world, a standard connection travels on

phone lines and phone lines transmit at 28.8bps. In essence, this means one should ask how

many graphics are used, how large they are, whether there is behind the scenes programming

that is invoked every time a page is loaded, and how many clicks it takes to get to the statistics.

UNSTATS is an excellent example of how to provide true remote access. Its interface is

very simple and has minimal graphics. The interface pages, as delivered to the end-user, are

straightforward HTML and involve no scripting apart from what may be used to initially

generate and load the page. As a result, it is very fast.

A. West / Journal of Government Information 29 (2002) 365–370 367

Resources are also routinely called ‘‘easy to use.’’ ‘‘Easy’’ is subjective, but some

illustrative examples are available. SourceOECD makes good use of web design standards.

The link to the Statistics section is easily spotted on the home page. When clicked, the end-

user’s eye will be drawn to the menu down the left which contains links to broad subjects, e.g.,

Agriculture. Given that end-users typically think in broad terms, this makes for a good match.

The end-user thinks, ‘‘I want stuff on agriculture. Oh, there is agriculture.’’ Then the user

clicks on Agriculture. SourceOECD also uses graphics to help orient the end-user, such as

smiley faces to indicate whether the end user’s institution has access to a given database.

SourceOECD’s implementation of the Beyond 20/20 browser is also well done. Because the

end-user’s operational options are always in view in a menu on the left-hand side of the

screen, it is easy to change variables, time periods, countries, or output options.

If a resource is supposed to allow users to download or save the information they’ve looked

up, take a look at the file formats users can choose from. Microsoft Excel is the most widely

used spreadsheet in the world and there should be a format compatible with it available to the

end-user. One effective instance of this is the WISTAT CD-ROM from the United Nations.

Users can interact with the statistics using Beyond 20/20, but there is a separate directory that

contains Excel formats of all the tables in the database. Users who already know what they

need can go straight to the Excel files, save a copy, and head back to their office to work while

users who do not know what they need can browse with Beyond 20/20.

Ideally, producers would include character delimited ASCII text file formats in case the

end-user’s file is too big for a disk or if the end-user wants to use her file with some software

other than Excel or because the end-user has another need for a nonproprietary file format.

WDI and the World Bank Africa Database use the same software and both offer users the

option of saving as ASCII text, Excel, SAS, and more.

4. Group 3 standards

The third group of standards depend on the resource in context: what about it makes it

worth having: content, querying software and/or data structure?

This standard can be the hardest to judge. Most resources for international statistics start

with the ‘‘same’’ sources, i.e., they start with data gathered by other international inter-

governmental organizations. In the absence of a summary comparing sources, potential

buyers have to go to the resources and try compare on a series by series basis. This is

virtually impossible due to the massive size of most resources, the limited time available to

buyers and, most importantly, structural differences in databases that mask similarities

between sources. The UNSTATS and International Financial Statistics (IFS) databases are

a good example of this.

When UNSTATS was introduced, its producers highlighted in particular its inclusion of

IMF data otherwise only available on the IFS CD. For potential buyers of UNSTATS who

already bought IFS, it was then important to determine the extent of overlap because IFS is

more expensive than UNSTATS. If UNSTATS provided enough data from the IMF, then

buyers might decide to discontinue the IFS subscription.

A. West / Journal of Government Information 29 (2002) 365–370368

The IFS database is a two-dimensional table in which every row represents a ‘‘series’’

defined as a set of statistics for a given country over a period of years. There is a minimum

of 30,000 rows in the IFS database. The number of observations would then be 30,000

times about 50 years plus an unknown number of quarterly and monthly periods, i.e., a

minimum of 1,500,000 observations. The maximum is harder to calculate. Not every

country will have a row for every statistic and IFS treats aggregated groups of countries as

if they were individual nations. Also, there are several odd series names that are probably

typographical errors, but which inflate the number of rows and thereby the number of series

and observations.

On the surface, UNSTATS appears to provide just under 100 series (with the attendant

larger number of observations). That implies that very little of the IFS database is captured by

UNSTATS. However, it turns out that the database structure underlying UNSTATS is multi-

dimensional, not two-dimensional. That means that all of the rows that would belong to, say,

capital account credit, and which would be counted individually in IFS as described above,

are collapsed in UNSTATS. In UNSTATS there will be a series, like capital account credit,

which has multiple dimensions including time and place. Thus, while there is a content

difference between IFS and UNSTATS, it is not as extreme as it might seem nor is it small

enough to justify dropping an IFS subscription.

5. Group 4 standards

Given all of the above, is a resource worth the cost or not? The answer is, of course, ‘‘it

depends.’’ Certainly, any resource that is cheap will get considered and in all honesty will

probably get judged less stringently simply because the financial stakes aren’t as high.

Conversely, any really expensive resource, even it appears to be really, really good, could

be dismissed out of hand.WDI on CD-ROM is very reasonably priced, works well, has lots of

content, and is fairly easy to use. WDI Online, to the extent that it performs as well as the free

Data Query on the World Bank web site, looks to be significantly better. It integrates

documentation, effectively exploits hyperlinks, and does not overdo the graphics. However,

compared with the cost of both the network license for the CD and for a similar web delivered

service such as UNSTATS, the cost is astronomical.

6. Conclusion

In an imperfect world where buyers have limited income, they must critically assess any

resource that provides access to international statistical resources. Some of the standards for

assessment will be applicable across the board, some will be specific to the resource and some

will be specific to the financial state of the buyer.

One test that all the producers of resources discussed above pass with flying colors is

responsiveness to customers. They have each taken critical comments constructively and

moved to address them and it has been appreciated by their users.

Resource name Best feature Biggest shortcoming

Eurostat (web site)

http://europa.eu.int/comm/eurostat/Public/datashop/print-catalogue/EN?catalogue=Eurostat

Unique content Almost none of it is free

FAOSTAT (web and CD) http://apps.fao.org/ Lengthy time series,

unique content

User doesn’t find out web

downloads aren’t free until

after trying to download

Census Bureau International Database (web and downloadable software)

http://www.census.gov/ipc/www/idbnew.html

Lengthy time series

of demographic data

Labeling and descriptions

on web site confusing

Foreign Labor Statistics (web) http://www.bls.gov/fls/home.htm Excellent documentation,

public data query clearly

directs user with numbered

steps

Public Data Query does

have a download option,

but it is not explicitly

described that way and

users could easily end up

doing more work than

necessary

International Financial Statistics (CD)

http://www.imf.org/external/pubs/pubs/dload/pubcat.pdf

Unique content, extremely

timely, lengthy time series,

lots of series, low

maintenance

Interface initially confusing

to users

LABORSTA (web) http://laborsta.ilo.org/ Free, lengthy time series,

includes worker injury and

strike statistics

Interface uses frames which

don’t meet accessibility

standards

SourceOECD (web) http://www.sourceoecd.org/ Provides trade by commodity

by country by year; Beyond

20/20 implementation is

excellent

Too many graphics, too

long to load each page, too

many clicks to get to data,

down too often

Table of ResourcesA.West

/JournalofGovern

mentInform

atio

n29(2002)365–370

369

UNSTATS (web) http://unstats.un.org/unsd/cdb/cdb_help/cdb_quick_start.asp Fast, tells user coverage for

series as a whole and for

Putting a link to the

Advanced Data Selection

370

each country in each series on every screen falsely

implies a context-sensitive

function; user will not

expect to have to start over

from scratch

UN Demographic Yearbook Historical Supplement (CD) 50-year time series in many

formats, including raw data

and sample SPSS data

dictionaries

Overly complex frames

interface that squeezes

target information into a

very small frame

UNESCO Statistics (web site)

http://www.uis.unesco.org/ev_en.php?ID=2867_201&ID2=DO_TOPIC

Freely available, stable, easy

to use, clear directions for

downloading

Limited statistics as

compared with other

sources that draw on

UNESCO data

WISTAT (CD) http://unstats.un.org/unsd/demographic/gender/wistat/ Unique content that’s hard to

come by

Beyond 20/20 software

can be difficult to use on a

public workstation that has

other titles also using

Beyond 20/20

World Bank Africa Database (CD) Unique content that’s hard to

come by, uses the same

software as World

Development Indicators

Not as much documentation

as on World Development

Indicators

World Development Indicators (CD)

http://www.worldbank.org/data/wdi/cdrom/

40 years of a huge number

of series drawn from many

different sources

Software is a little clunky,

initial results display is

confusing

A.West

/JournalofGovern

mentInform

atio

n29(2002)365–370