CSU’s EPA-FUNDED PROGRAM ON “APPLYING SPATIAL AND
Download
Report
Transcript CSU’s EPA-FUNDED PROGRAM ON “APPLYING SPATIAL AND
METADATA: A LEGACY FOR OUR
GRANDCHILDREN
N. Scott Urquhart
STARMAP Program Director
Department of Statistics
Colorado State University
# 1
DISCLAIMERS
The work reported here today was developed under the STAR
Research Assistance Agreement CR-829095 awarded by the U.S.
Environmental Protection Agency (EPA) to Colorado State
University. This presentation has not been formally reviewed by
EPA. The views expressed here are solely those of author and the
STARMAP, the program he represents. EPA does not endorse any
products or commercial services mentioned in this presentation.
The people of CEER-GOM have heard parts of this presentation.
Sorry. That presentation at Ocean Springs, MS (3/26/02) led to an
invitation for this talk.
# 2
CONTEXT FOR COMMENTS
SPACE-TIME AQUATIC RESOURCES
MODELING AND ANALYSIS PROGRAM
= STARMAP
STARMAP IS FUNDED BY EPA’s STAR PROGRAM,
AS ARE ALL OF THE EaGLes PROGRAMS
(==> “SIBLING” PROGRAMS)
STARMAP IS TO USE EMAP AS A DATA SOURCE
AND CONTEXT
NSU = STARMAP PROGRAM DIRECTOR @ CSU
10 YEARS OF COLLABORATION WITH EMAP
40+ YEARS AS STATISTICIAN WORKING WITH ECOLOGISTS
# 3
AN IMPORTANT LESSON
YOU DO NOT KNOW WHAT YOUR DATA
WILL BE USED FOR 20 YEARS FROM NOW
BY THE TIME THE VARIOUS EaGLes
PROGRAMS ARE COMPLETE WE, AS TAX
PAYERS, WILL HAVE INVESTED > $40M
IN THE VARIOUS STUDIES
THE RESULTING DATA NEEDS TO BE
RESPONSIBLY AND READILY AVAILABLE
TO FUTURE GENERATIONS
# 4
YOU DO NOT KNOW WHAT YOUR DATA
WILL BE USED FOR 20 YEARS FROM NOW
POPULAR PRESPECTIVE - WE “KNOW” LOTS
ABOUT THE “ENVIRONMENT”
REALITY: GOOD AQUATIC DATA IS SCARCE
SPATIALLY EXTENSIVE
OVER A REASONABLE TIME SPAN
WELL DOCUMENTED PROCEDURES
WELL TRAINED CREWS
CAREFULLY EXECUTED STUDIES
DATA PUBLICALLY AVAILABLE
# 5
THE VALUE OF “METADATA”
DATA
WITHOUT CONTEXT ARE NUMBERS
NEARLY WORTHLESS TO OTHERS
How many file cabinets full of data are in your park offices?
DATA WITH CONTEXT IS INFORMATION
CAN BE VALUABLE TO OTHERS
CONTEXT IS CALLED METADATA
# 6
VERY DISCOURAGING EXPERIENCE
WITH HISTORIC DATA
THREE HISTORIC DATA SETS
NUTRIENTS IN NORTHEAST LAKES
Larsen, D. P., N. S. Urquhart and D. Kugler (1995). Regional scale
trend monitoring of indicators of trophic condition of lakes. Water
Resources Bulletin 31:117 - 140.
E. COLI IN A RIVER BASIN IN OREGON
NUTRIENTS IN LAKES & STREAMS IN EPA
REGION 10
EMAP SURFACE WATERS
I THOUGHT THIS WAS WELL DOCUMENTED!
# 7
SO WHAT IS METADATA?
BEST DEF’N SEEMS TO BE ORGANIZED
“DATA ABOUT DATA”
VERY DIVERSE VIEWS ABOUT WHAT IT SHOULD
CONTAIN:
LIBRARIANS
W3 - GROUP - - DEFINING FEATURES OF THE WORLD WIDE
WEB { title, description, publication date and author }
CENSUS-BUREAU TYPES, WORLDWIDE
GEOGRAPHIC DATA STANDARDS
EPA’s STORET
# 8
WHAT IS METADATA GOOD FOR?
A Librarian probably would answer
Discovery
Managing the resource (Ownership &responsibility)
ARCHIVING
AUTHENTICATING - QA/QC - UNCHANGING
GROWING
This statistician answers
For correctly analyzing data in the future
Not discovery, but correct utilization
Paths to related documents based on the same dataset
# 9
METADATA COMPONENTS IMPORTANT
TO A PERSON ANALYZING THE DATA
NAME OF DATASET
DEFINITION OF RESPONSES EVALUATED
MOTIVATING FACTORS
INTERNAL FEATURES OF DATASET
# 10
IMPORTANT METADATA COMPONENT:
DATASET NAME
IS THIS REALLY IMPORTANT?
YES!
IMPORTANT FINDINGS FROM A DATASET WILL
BE PUBLISHED.
WE NEED TO ADOPT A CONVENTION THAT THE DATASET
NAME IS A KEYWORD.
Name needs to be permanent and consistently used
THEN THEN FUTURE INVESTIGATORS CAN USE
STANDARD SEARCH TOOLS TO FIND INFORMATION
EXTRACTED FROM EACH DATASET.
MUCH LONGER LIVED THAN WEB LINKS
# 11
IMPORTANT METADATA COMPONENT:
DATASET NAME
{ continued }
Filtering criteria for data on which publication is based
Name of existing named subset
Geographic/temporal subset
Response subset
# 12
IMPORTANT METADATA COMPONENT:
DEFINITION OF RESPONSES EVALUATED
USE IT TO DOCUMENT
SITE SELECTION AND LOCATION
FIELD PROTOCOLS FOR GATHERING
DATA & MATERIAL
Peck DV, Lazorchak JM, Klemm DJ, editors. 2001. EMAP Surface Waters:
Western Pilot Study field operations manual for wadeable streams. Corvallis
(OR): U.S. Environmental Protection Agency, Office of Research and
Development. 275 p.
http://www.epa.gov/emap/html/pubs/index.html
LABORATORY METHODS
QUALITY ASSURANCE/QUALITY CONTROL
# 13
IMPORTANT METADATA COMPONENT:
MOTIVATING FACTORS
WHAT WERE THE STUDY OBJECTIVES?
Scale = one page (perhaps a lot more in this context);
Specific objectives
Narrative on their origin
WHY & HOW WERE THE SITES SELECTED?
From some population of sites (restrictions)
Purposefully
Good idea - accessibility of whole study plan
# 14
IMPORTANT METADATA COMPONENT:
INTERNAL FEATURES OF DATASET
LARGE DATASETS OFTEN CONSIST OF MANY
SUB DATA SETS
EG: EMAP MAHA DATA COLLECTION CONSISTS
OF 42 SAS DATASETS
UNIQUE SITE IDENTIFICATION; WITH DATE OF SITE
VISIT DATA IS UNIQUELY IDENTIFIED.
Why was this subset of the data constructed?
Who knows more about it
Which responses are in which data sets?
Be careful that values are the same in each data set
# 15
IMPORTANT METADATA COMPONENT:
INTERNAL FEATURES OF DATASET
(continued)
Data dictionary
Usable paths to definition of variables
METHODS USED TO DEAL WITH
NONDETECTS, MISSING OR LOST DATA, ETC
# 16
Acknowledgement: Nancy Chaffin, Metadata Librarian,
Morgan Library, Colorado State University
THANK YOU FOR YOUR ATTENTION
QUESTIONS and/or COMMENTS ARE WELCOME
# 17