Data services in Canada: who, where, when, and why

Download Report

Transcript Data services in Canada: who, where, when, and why

Data services in Canada: what,
when, why, and who
Presentation to:
ACRL Sociology Librarians Discussion Group
By Laine G.M. Ruus, University of Toronto
June 21, 2003
Overview:
• What are data?
• When are data services and how are they
different?
• Why are data services?
• Who provides data services?
• Where are data services?
What are data?
wisdom
knowledge
information (statistics)
data
Data and statistics are not equals
• Statistics are:
– Two classes of statistics
• Descriptive statistics: summaries of common
characteristics of the raw data units
(one-way tables, two-way tables … multi-way tables)
• Inferential statistics: measure strength and
direction of relationships among characteristics of
raw data units
– Traditional libraries deal mainly with descriptive
statistics and knowledge
Data and statistics are not equals (cont’d)
• Data are:
– the raw materials from which statistics are generated
– ideally, available at the level at which the data were
originally collected
– need to be manipulated with statistical software in order
to be comprehensible
– data libraries, data archives, and other data services deal
mainly with raw data
Data
Metadata
Descriptive statistics:
•
from raw data with 15 characteristics
(variables) one can produce
–
15 x 14 = 210 two-way tables
e.g. age by sex
–
15 x 14 x 13 = 2,730 three-way tables
e.g. age by sex by province/state
Importance of training methods by age:
80
70
60
50
40
15-34 yrs
35+ yrs
30
20
10
0
T&E
On-line
Webbased
The mind, in short, works on the data it receives very much
as a sculptor works on his block of stone. In a sense the
statue stood there from eternity. But there were a thousand
different ones beside it, and the sculptor alone is to thank for
having extricated this one from the rest.
Source: William James (1842-1910), U.S. psychologist, philosopher,
Principles of Psychology, ch. 9 (1890)
When are data services?
Two main classes of usage:
• Reference uses
– Looking for pre-digested descriptive statistics
and evaluation of those statistics (knowledge)
– Intend to report statistics ‘as-is’
• Research uses
– Looking for data, or large quantities of lowlevel aggregate (descriptive) statistics
– Intend to do own statistical analysis
The research process and data services
The researcher
• Finding the right data
• Acquiring the data
• Analyzing the data
• Publishing the research
The data service
• Data location or
identification
• Data acquisition
• Metadata creation &
management
• Custom tabulations
• Data migration
• Data interpretation
• Advice on data citation
• Data preservation
How are data services different from
other library procedures?
Data location:
•
•
•
•
No union catalogue of data files
No ‘books in print’, no jobbers
Acquisition dependant on who producer is
No tradition of citing data/statistical sources in
bibliographies, makes identification difficult
Data acquisition:
• Commoditization of information in an
information-based economy makes raw materials
of information expensive
• Uncertainty among data producers about data
ownership/usage issues (antiquated copyright acts)
• Silo approach to data/statistics access means most
data are restricted use
• Data licenses require data services staff to act as
data police
Usage silos in Canada:
• Discrete research projects, e.g. CCRI
• Research Data Centres (RDCs)/remote job submission
programs – access to microdata, research proposals are
vetted
• Data Liberation Initiative (DLI) – additional fee-based
descriptive statistics, microdata and map products,
institutional memberships
• Depository Libraries Program (DSP) – access to some feebased descriptive statistics, walk-in users of DSP libraries
• “the public” – access to limited free descriptive statistics
on selected web sites
Data usage:
• Not the level of material with which librarians normally
work
• Libraries assume user is literate, concentrate on access
• We cannot assume the user is numerate, must concentrate
on training and interpretation (access is being solved)
– requires understanding of research process, of statistics creation
process, and knowledge of special software
• Involvement in publishing process (esp. metadata, data
editions)
• Little training available, and a steep learning curve
Data acknowledgement (citations):
• New vocabulary, same old concept
• Few citation manuals cover it
• Few publishers/editors require it
Who provides data services?
Two major models of data services:
• Local data service units
– USA and Canada
– Usually in academic institutions
– A few large multi-institutional, specialized
centres (e.g. ICPSR, Roper, ARDA etc.)
• National data archives
– Europe and elsewhere
Who provides data services in academic
institutions:
100
90
80
70
60
50
40
30
20
10
0
Libraries
Non-libraries
Canada
United States
Elsewhere
Why are data services?
• Because data are the ideal multi-site, multi-user
system
• And because faculty members don’t talk to each
other
• Therefore, need to rationalize acquisitions
• And need to rationalize post-processing (metadata
creation) and other user support services
• And build up expertise
• And centralize data policing
What’s happening at present?
• Emerging standard for metadata formats
(DDI/DTD)
• Microdata extractor interfaces
• GIS applications and interfaces (aggregate data)
• Producers increasingly marketing directly to users
Predictions for the future
• We are going to loose a lot of history (data)
• Users will become more numerate and have more
GIS-sense
• As we get access to more statistics, and less data,
decisions will be increasingly founded on bad
information
• Data and GIS services will be busier than ever!
The bottom line:
• Centralized services increases the need for
local services
• The most exciting Gebiet in libraries today
• Everyone should have one