CSU’s EPA-FUNDED PROGRAM ON “APPLYING SPATIAL AND

Download Report

Transcript CSU’s EPA-FUNDED PROGRAM ON “APPLYING SPATIAL AND

METADATA: A LEGACY FOR OUR
GRANDCHILDREN
N. Scott Urquhart
STARMAP Program Director
Department of Statistics
Colorado State University
# 1
DISCLAIMERS
 The work reported here today was developed under the STAR
Research Assistance Agreement CR-829095 awarded by the U.S.
Environmental Protection Agency (EPA) to Colorado State
University. This presentation has not been formally reviewed by
EPA. The views expressed here are solely those of author and the
STARMAP, the program he represents. EPA does not endorse any
products or commercial services mentioned in this presentation.
 The people of CEER-GOM have heard parts of this presentation.
Sorry. That presentation at Ocean Springs, MS (3/26/02) led to an
invitation for this talk.
# 2
CONTEXT FOR COMMENTS
 SPACE-TIME AQUATIC RESOURCES
MODELING AND ANALYSIS PROGRAM
= STARMAP
 STARMAP IS FUNDED BY EPA’s STAR PROGRAM,
AS ARE ALL OF THE EaGLes PROGRAMS
(==> “SIBLING” PROGRAMS)
 STARMAP IS TO USE EMAP AS A DATA SOURCE
AND CONTEXT
 NSU = STARMAP PROGRAM DIRECTOR @ CSU
 10 YEARS OF COLLABORATION WITH EMAP
 40+ YEARS AS STATISTICIAN WORKING WITH ECOLOGISTS
# 3
AN IMPORTANT LESSON
 YOU DO NOT KNOW WHAT YOUR DATA
WILL BE USED FOR 20 YEARS FROM NOW
 BY THE TIME THE VARIOUS EaGLes
PROGRAMS ARE COMPLETE WE, AS TAX
PAYERS, WILL HAVE INVESTED > $40M
IN THE VARIOUS STUDIES
 THE RESULTING DATA NEEDS TO BE
RESPONSIBLY AND READILY AVAILABLE
TO FUTURE GENERATIONS
# 4
YOU DO NOT KNOW WHAT YOUR DATA
WILL BE USED FOR 20 YEARS FROM NOW
 POPULAR PRESPECTIVE - WE “KNOW” LOTS
ABOUT THE “ENVIRONMENT”
 REALITY: GOOD AQUATIC DATA IS SCARCE
 SPATIALLY EXTENSIVE
 OVER A REASONABLE TIME SPAN
 WELL DOCUMENTED PROCEDURES
 WELL TRAINED CREWS
 CAREFULLY EXECUTED STUDIES
 DATA PUBLICALLY AVAILABLE
# 5
THE VALUE OF “METADATA”
 DATA
WITHOUT CONTEXT ARE NUMBERS
 NEARLY WORTHLESS TO OTHERS
 How many file cabinets full of data are in your park offices?
 DATA WITH CONTEXT IS INFORMATION
 CAN BE VALUABLE TO OTHERS
 CONTEXT IS CALLED METADATA
# 6
VERY DISCOURAGING EXPERIENCE
WITH HISTORIC DATA
 THREE HISTORIC DATA SETS
 NUTRIENTS IN NORTHEAST LAKES
 Larsen, D. P., N. S. Urquhart and D. Kugler (1995). Regional scale
trend monitoring of indicators of trophic condition of lakes. Water
Resources Bulletin 31:117 - 140.
 E. COLI IN A RIVER BASIN IN OREGON
 NUTRIENTS IN LAKES & STREAMS IN EPA
REGION 10
 EMAP SURFACE WATERS
 I THOUGHT THIS WAS WELL DOCUMENTED!
# 7
SO WHAT IS METADATA?
 BEST DEF’N SEEMS TO BE ORGANIZED
“DATA ABOUT DATA”
 VERY DIVERSE VIEWS ABOUT WHAT IT SHOULD
CONTAIN:
 LIBRARIANS
 W3 - GROUP - - DEFINING FEATURES OF THE WORLD WIDE
WEB { title, description, publication date and author }
 CENSUS-BUREAU TYPES, WORLDWIDE
 GEOGRAPHIC DATA STANDARDS
 EPA’s STORET
# 8
WHAT IS METADATA GOOD FOR?
 A Librarian probably would answer
 Discovery
 Managing the resource (Ownership &responsibility)
 ARCHIVING
 AUTHENTICATING - QA/QC - UNCHANGING
 GROWING
 This statistician answers
 For correctly analyzing data in the future
 Not discovery, but correct utilization
 Paths to related documents based on the same dataset
# 9
METADATA COMPONENTS IMPORTANT
TO A PERSON ANALYZING THE DATA
 NAME OF DATASET
 DEFINITION OF RESPONSES EVALUATED
 MOTIVATING FACTORS
 INTERNAL FEATURES OF DATASET
# 10
IMPORTANT METADATA COMPONENT:
DATASET NAME
 IS THIS REALLY IMPORTANT?
 YES!
 IMPORTANT FINDINGS FROM A DATASET WILL
BE PUBLISHED.
 WE NEED TO ADOPT A CONVENTION THAT THE DATASET
NAME IS A KEYWORD.
 Name needs to be permanent and consistently used
 THEN THEN FUTURE INVESTIGATORS CAN USE
STANDARD SEARCH TOOLS TO FIND INFORMATION
EXTRACTED FROM EACH DATASET.
 MUCH LONGER LIVED THAN WEB LINKS
# 11
IMPORTANT METADATA COMPONENT:
DATASET NAME
{ continued }
 Filtering criteria for data on which publication is based
 Name of existing named subset
 Geographic/temporal subset
 Response subset
# 12
IMPORTANT METADATA COMPONENT:
DEFINITION OF RESPONSES EVALUATED
 USE IT TO DOCUMENT
 SITE SELECTION AND LOCATION
FIELD PROTOCOLS FOR GATHERING
 DATA & MATERIAL
 Peck DV, Lazorchak JM, Klemm DJ, editors. 2001. EMAP Surface Waters:
Western Pilot Study field operations manual for wadeable streams. Corvallis
(OR): U.S. Environmental Protection Agency, Office of Research and
Development. 275 p.
 http://www.epa.gov/emap/html/pubs/index.html
 LABORATORY METHODS
 QUALITY ASSURANCE/QUALITY CONTROL
# 13
IMPORTANT METADATA COMPONENT:
MOTIVATING FACTORS
 WHAT WERE THE STUDY OBJECTIVES?
 Scale = one page (perhaps a lot more in this context);
 Specific objectives
 Narrative on their origin
 WHY & HOW WERE THE SITES SELECTED?
 From some population of sites (restrictions)
 Purposefully
 Good idea - accessibility of whole study plan
# 14
IMPORTANT METADATA COMPONENT:
INTERNAL FEATURES OF DATASET
 LARGE DATASETS OFTEN CONSIST OF MANY
SUB DATA SETS
 EG: EMAP MAHA DATA COLLECTION CONSISTS
OF 42 SAS DATASETS
 UNIQUE SITE IDENTIFICATION; WITH DATE OF SITE
VISIT DATA IS UNIQUELY IDENTIFIED.
 Why was this subset of the data constructed?
 Who knows more about it
 Which responses are in which data sets?
 Be careful that values are the same in each data set
# 15
IMPORTANT METADATA COMPONENT:
INTERNAL FEATURES OF DATASET
(continued)
 Data dictionary
 Usable paths to definition of variables
 METHODS USED TO DEAL WITH
 NONDETECTS, MISSING OR LOST DATA, ETC
# 16
Acknowledgement: Nancy Chaffin, Metadata Librarian,
Morgan Library, Colorado State University
THANK YOU FOR YOUR ATTENTION
QUESTIONS and/or COMMENTS ARE WELCOME
# 17