Three Core Technologies: Meta Data, Components, and UML

Download Report

Transcript Three Core Technologies: Meta Data, Components, and UML

Managing Information Technology Seminar
Metadata Registries
Jim Carpenter
Bureau of Labor Statistics
June 9, 1998
Full Disclosure
• Personal experience
– recent work: workshop & literature study
– future?: L8 (Data Representation Committee of
ANSI, TAG of ISO/IEC/JTC1/SC14)
• BLS experience
– Cathy Dippo, Asst. Commissioner (ORE)
• US experience
– Environmental Protection Agency, Census
– Others: HHS, DOD, ...
• European & Australian experience
– UN Economic Commission for Europe, ...
Meta
• Pronunciations
– Metah
– Maytah
– Meetah
• Root Meanings
– Greek: with, after
– Free On Line Dictionary of Computing:
one level of description up.
» Metamodel
» metalanguage
» metaphilosophy
Definitions
ANSI X3.285:1998
• Data: a representation of fact, idea or
instruction in a formalized manner
suitable for communication,
interpretation, or processing by humans
or by machines.
• Metadata: Data used to describe the
meaning or characteristics of data
• US Bureau of Census: Descriptive information or
documentation that facilitate data sharing and
understanding over the lifetime of the data.
Example
• Data (instances) • Metadata
3000
2000
1000
5000
50
1000
20
– Name: Landing Incident
– Definition: distance to nearest plane
at time of landing
– Units: feet
– Method of measurement: estimate of
air traffic controller
– Method of sample selection: casually
selected observations during Bob’s
visit
– Possible values: {0, 10, 20, 30, 50,
100, 500, 1000, 2000, 5000}
What does the metadata say about the quality of the data?
Problems with Data
• Finding the right data
• Unknown source and purpose
• Misleading and inconsistent names and
definitions
• Inconsistent rules for obtaining data
• Unknown possible values
– range, units, names, concepts
• Who to ask?
Metadata Registry
To solve data problems for:
•
•
•
•
•
Customers
Producers
System Designers
System Managers
Software Components
Users of Metadata
Customer
Producer
System
Manager
= Interface
= Component
Designer
Metadata Efforts by Subject Area
•
•
•
•
•
•
•
•
•
Geo-spatial (maps)
Library archives (books)
Business (establishments)
Population (households)
Chemicals
Pharmaceuticals
Health Care Providers and Payers
Pollution sources
software component interfaces
Web sites: http://www.ulb.ac.be/ceese/
http://www.lbl.gov/~olken/epa.html
Classification Schemes
(applied to data element components)
•
•
•
•
Ontology - building on primitive concepts
Taxonomy - partitioning by attributes
Thesaurus - relational networks of words
Key Words - word attributes
Metadata Standards
• ISO/IEC Joint Technical Committee
– ISO/IEC 11179: Information Techonology - Specification and
standardization of data elements
–
–
–
–
–
–
Part 1: framework
Part 2: classification for data elements
Part 3: basic attributes of data elements
Part 4: rules and guidelines for data definitions
Part 5: naming and identification principles
Part 6: registration of data elements
• ANSI (a Technical Advisory Group of ISO/IEC JTC1)
– NCITS (formerly X3 Committee)
– X3.285:1998 Metamodel for the Management of Shareable
Data
– ISO/IEC fast track
ANSI X3.285-1998
Metamodel for the Management of
Shareable Data
• Promote sharing of metadata for
–
–
–
–
–
understanding (meaning, representation, identification)
discovery
harmonization
reuse
analysis
• Provide a common base for metadata registries
– management structure
– components for interchange
A Data Registry
Contains characteristics of data to
•
•
•
•
clearly describe data
inventory data
analyze data
classify data
Data Registry Basic Structure
Data Element: A unit of data for which
the definition, identification, representation,
and permissible values are specified by
means of a set of attributes.
Data Element Concept: An idea that can be
represented in the form of a data element,
described independently of any particular
representation.
Data Element
Data Element
Concept
Object Class
Object Class: A set of ideas,
abstractions, or things in the
real world that can be identified
with explicit boundaries and
meaning and whose properties
and behavior follow the same
rules.
Property
Data Element
Representation
Property: The human perception
of a single characteristic of an
object class in the real world. It
has no particular associated
means of representation by which
the property can be communicated.
Data Element Representation: The
part of a data element having a
value domain, datatype,and other
representational specifications.
Data Element Class
UML Class Diagram
Data Element
•Data Element Concept
Data Element Concept
•Representation
•Object Class
•Property
Note: neither class has any
exposed methods
Components of Data Element
UML Class Diagram
Data Element
1:1
Data Element Concept
1:1
Representation
1:1
1:1
Object Class
1:1
Property Class
0:1
1:N
Data Type
Range
Permissible Value
Data Element as an Association
UML Class Diagram
Data Element
Data Element Concept
Object Class
Property Class
Representation
Example
Data Element
Data Element Concept
Flower
Color
String:{red | blue}
Metamodel
• Like any other data model,
except that ...
• It describes the structure of data that is about data
– Is a conceptual data model
– No mapping to an implementation of a data registry
• A conformant data registry may be a stand-alone
data product, or implemented as part of an IRDS,
or any other information repository or database.
X3.285 Metamodel Regions
Stewardship
Data Element
Administration
Data Element
Concept
Administration
Naming &
Identification
Conceptual &
Value Domain
Administration
Classification
record
Stewardship Region
(as a UML class diagram)
creation date
change
note
unresolved issue
origin
contact person
name
title
mail address
describer
1..1
component record
describee
reference document
component reference document
reference
type
label
describee
0..*
Administered Component
0..*
0..*
0..*
provided
component submitting organization
submission
contact person : contact person
0..*
registered item
0..*
reference document organization
administered
item
1..*
provider
1..1
submitter
organization
label
mail address
1..1
administrator
component responsible organization
contact person : contact person
Registration Authority
identifier
documentation language
1..*
registrar
1..1
component registration authority
1..*
registrar
label
1..1
version
identifier
registration status
administration status
record
Stewardship Region
(as a UML class diagram)
creation date
change
note
unresolved issue
origin
contact person
name
title
mail address
describer
1..1
component record
describee
reference document
component reference document
reference
type
label
describee
0..*
Administered Component
0..*
0..*
0..*
provided
component submitting organization
submission
contact person : contact person
0..*
registered item
0..*
reference document organization
administered
item
1..*
provider
1..1
submitter
organization
label
mail address
1..1
administrator
component responsible organization
contact person : contact person
Registration Authority
identifier
documentation language
1..*
registrar
1..1
component registration authority
1..*
registrar
label
1..1
version
identifier
registration status
administration status
Stewardship Region of X3.285:
Data Element Management Model
Record
creation data
change
note
unresolved issue
origin
Reference Document
+reference
type
label
0..*
1
Submitting Organization
contact person
+describee
+submission
Organization
label
mail address
+submitter
0..*
0..*
1
Administered Component
1
0..*
1
+administered item
0..*
+administrator
+registered item
Responsible Organization
contact person
Registration Authority
identifier
documentation language
+registrar
1..*
1
Component Registration Authority
1..*
Registrar
label
Contact Person
name
title
mail address
version
identifier
registration status
administration status
Other X3.285 Metamodel Regions
• See the full text document at
– ftp://sdct-sunsrv1.ncsl.nist.gov/x3l8/x3l8docs/x3.285/docs/
Some Basic Principles
(Object Roles in ORM)
• The components to be administered are:
– classification schemes
– data elements
– domain values
- classified components
- rules
• An administered component must
–
–
–
–
–
–
have only one record
have a unique identifier
have one registration status and administrative status
is administered by only one Responsible Organization
is submitted by only one Submitting Organization
must be registered by at least one Registration Authority who
• must have a unique identifier
• must be registered by exactly one registration authority
Principles, cont.
• A data element concept must
– convey exactly one property
– have exactly one conceptual domain
– be the perception of exactly one object class
• A data element must
–
–
–
–
be derived via at most one rule
be derived meaning from exactly one data element concept
be represented by at least one example
be represented with exactly one value domain
Metadata Registration Principles
• Non exclusive registration: Every organization
may be a Registration Authority.
• Data sharing registration: Data may be shared
intra- or inter-organizationally.
• Economically enforced registration: Utility
determines longevity and usefulness.
• Flexible Registration: Metadata may be
registered at different levels of quality.
Notable Activities
• Environmental Protection Agency
– http://www.epa.gov/edr/
• Australian Institute of Health & Wealfare
– http://www.aihw.gov.au
• W3C - XML
– http://www.w3.org/Metadata/Overview.html
• Microsoft Respository
– http://www.microsoft.com/Repository/techmat/RepGuide/default.htm
Architecture of Microsoft Repository
References
• Statistical Journal of the UN Economic Commission for Europe,
Vol.. 10, No. 2, 1993
– Statistical Metainformation Systems - pragmatics, semantics, syntactics, Bo
Sundgren, Statistics Sweden
• Metadata Management in Statistical Information Processing, Karl
A. Froeschl, Springer, Wien, 1997