Transcript Slide 1

Preliminary Description
of the
Environmental Data Challenge
for DoD M&S
Briefing by:
Virginia T. Dobey
SAIC/SETA Support to DMSO
Environmental Representation
Domain Lead
(703) 824-3411 or (703) 963-8512
[email protected]
DMSO Task:
Environmental Representations
Provide consistent, comprehensive
environmental representations
that include the natural
environment, as well as
representations of anthropogenic
impacts, flora, and fauna, to DoD
M&S users before FY 2014 when
and where needed.
• Provide, before FY 2008, environmental data sets,
algorithms, models, tools, and documentation to
environmental resource repositories.
• Establish, before FY 2009, a capability to provide
authoritative and dynamic representations of the natural
environment.
• Establish and publish, before FY 2010, authoritative data
sources, data dictionaries, data structure, attribution
scheme, symbology, and metadata for each natural
environment domain; and provide a common
interchange mechanism for both static and dynamic
environmental representations.
• Provide, before FY 2012, tools to ensure that natural
environmental representations dynamically interact with
other representations.
Level I
Level II Level III Level IV Level V
Explanation
Technologies provide the means to
represent environmental data
(terrain, ocean, air and space),
and promote the unambiguous,
loss-less and non-proprietary
interchange of environmental
data.
In M&S, a complete and accurate environmental representation must
include not only the environmental conditions but also their effects on
system C&P, as well as feedback of system activity on the environment.
This, in turn, requires environmental data that can be FUSED with other
data sources.
Space conditions
Atmospheric conditions
Terrain conditions
Ocean conditions
Impact of
platforms, weapons, sensors,
and their actions on
space, atmosphere, terrain, and
ocean conditions
Effects of
space, atmosphere, terrain,
and ocean
conditions on
platforms, weapons, and
sensors
The Emerging GIG Data Environment
(Task, Post, Process and Use - TPPU)
Producer tags
and post data
Consumer
Producer
Searches metadata catalogs to
find data
Analyzes metadata to
determine context of data found
Pulls selected
data based on understanding
of metadata
Consumer can find
and pull data
based on metadata
tags
Describes content using
metadata
Posts metadata in
catalogs and data in
shared space
Metadata
Catalogs
Enterprise &
Community
Web
Sites
Security Services
(e.g., PKI,
SAML)
Shared Data
Space
Application Services
(e.g., Web)
Location of Data
Posted in Metadata
Catalogs
Ubiquitous
Global
Network
Metadata
Registries
Actual Data
posted to
shared data
spaces
Developer
Data Standards
posted in Metadata
registries
Posts to and uses
metadata registries to
structure data and
document formats for reuse
and interoperability
GIG Policy: The TPPU Paradigm
(diagram obtained from: http://ges.dod.mil/about/tppu.htm)
What are Warfighter Issues?
Shifting Paradigms
•
•
•
The adoption of a Net-Centric Data Enterprise
– It’s not just a producer / user world anymore…
(now EVERYONE’s a producer!)
– Consumers want access to data / information / knowledge
immediately
– Consumers want to input how the data is manipulated/filtered
Moving from a …Collector / Product focus: Task, Process, Exploit and
Disseminate
To a ... Analyst / Data focus: Task, Post, Process and Use (share)
Reliance on “Factory”
Resource intensive data
download
One (producer) to many
(consumers)
Bandwidth utilization /
availability - not a
consideration
Moving to “many-to-many”
topology
Smart “data ordering” agents
Sharing of information
Immediate access to
Through-the-Sensor data
Bandwidth - critical to warfighters
GIG: Increasing the
Interoperability Challenge
• Everyone is a potential producer
• Multiple legacy environmental data sources and
user systems exist
– Significant investment in existing production and user hardware
and software
– Data in multiple (often system-specific) formats need updating
• Few data resources are reliably compatible,
even those produced by the Government
– example: OAML — product-specific formats
• “Power to the Edge” concept empowers user to
identify other sources of required data
– No requirement for common data syntax/semantics
– Increases the challenge of data fusion
GIG: Assumptions in Assessing
Environmental Data Interoperability
• Traditional data producers will continue to provide data in
producer-specific and product-specific formats following
existing production guidelines, since those products and
formats meet the general needs of most customers (users).
Formats will continue to leverage producer standards such as
the Joint METOC Conceptual Data Model and the Feature and
Attribute Coding Catalog. Tailoring data to user requirements
will remain a user responsibility.
• Users will need a data mediation capability that can access not
only these traditional data sources but also non-traditional and
often unknown data sources such as commercial products
(sometimes having proprietary formats) and streaming data
from in-situ sensors (anticipated development using future
technology) which can be identified and obtained over the GIG
Barriers to Data Interoperability
• Data sources, models, and operational systems
developed independently of each other
• Simulations not traditionally designed to interface
with operational systems (and sometimes with
each other!)
• Tailored (both in format and in content) datasets
that are optimized for a specific system support
only specific uses
Result: syntactically and semantically different
forms of data representation are in use
Developing Interoperable Data
“A data model is an abstract, self-contained, logical definition of the objects, operators,
and so forth, that together constitute the abstract machine with which users interact.
The objects allow us to model the structure of data…An implementation of a given
data model is a physical realization on a real machine of the components of the
abstract machine that together constitute that model…the familiar distinction between
logical and physical…” [emphasis in the original]
C.J. Date1
“Logical Data Model: A model of data that represents the inherent structure of that data
and is independent of the individual applications of the data and also of the software
or hardware mechanisms which are employed in representing and using the data.”
DoD 8320.1-M2
“Normalization leads to an exact definition of entities and data attributes, with a clear
identification of homonyms (the same name used to represent different data) and
synonyms (different names used to represent the same data). It promotes a clearer
understanding and knowledge of what each data entity and data attribute means.”
C.Finkelstein3
1Colleague
of E.F. Codd, originator and developer of relational database theory
2 DoD authority on information engineering
3 “Originator and main architect of the Information Engineering methodology”
Normalization Challenges
• Users are familiar with non-normalized physical
data elements. Tendency is to call these “logical”
and stop there.
• In any large data model, normalization is difficult.
It is often ignored (benign neglect).
• Complete data models incorporate business rules
(how the entities relate to each other).
• May not be needed for an implementationindependent model used to develop a data
dictionary (of interoperable concepts), but…
Achieving Data Interoperability:
The Three-Schema Architecture
External schema
User 1
User 2
…
Internal schema
User N
User application views
Converting user-specific
data requirements into
conceptual “building
blocks” for data
integration
Also
facilitates
ingest of
other
source
data
*key *key
*key *key
Conceptual schema
*key
*key *key
Logical data model
building blocks are
the basis for
application
data
structures
Normalized logical data model serves as
conceptual design “bridge” from the external
schema to and from the internal schema
The Three-Schema Architecture
Applied to Environmental Data
User 1
User 2
…
User N
User (production) applications:
• CBRN,
• Weather effects,
• Terrain trafficability, …
Prod 1
Prod 2
…
Prod N
Producer product formats:
• METOC
producer-specific formats,
• NGA product formats,
• JMCDM,
• FACC, …
Fusion of
normalized data
internal to the
system
*key *key
*key *key
*key
*key *key
Normalized logical
data model serves as
conceptual “bridge
Allows for
ingest of
other
source data
Implementation-independent “middle layer”
can be placed at the producer interface, user interface
or somewhere in between
Creating a Reusable ImplementationIndependent Middle Layer
Such an architectural layer must be:
• Independent of source products
• Independent of optimized system
implementation
• Provides for the FULL SPECIFICATION of all
source product data as well as all system data
requirements
• Developed as an implementation-independent
(LOGICAL) relational data model, as required by
DoDAF OV-7 Product view
A Reusable Middle Layer for
Environmental Data
• Requires standardized terms in all environmental
domains – leverage existing International/DoD
standards
• Requires a concise, well-organized, non-redundant
data structure –
– Must extend from a normalized logical data model
• Requires highly granular, independent data elements
–‘atomic’ level concepts
– To support the many formats required by users recise
rendering of translations to and from the hub)
A Complete Representation:
All Environmental Domains
A Concise Non-Redundant
Data Structure
• Must address format as well as content
– Format
• Must handle the large number of required data
representation formats while preserving
consistency of data (the “fair fight” across the
federation)
– Content
• Must be based on atomic data elements from a
normalized logical data model (support for data
fusion)
Challenge: The Many Formats of M&S Data
1, 2, and 3-D point observation data
Controlled Image Base (raster)
Geometry
DTED (gridded)
Vector topology
Trees
Interchange
Hub
Lake
Foundation Feature Data (vector)
Surface Backscatter Strength
EM Band
as a Function of Angle of incidence and EM Band
Angle of incidence in degrees
15
30
45
60
75
90
microwave 300 290 240 207 198 170
L-Band 160 230 180 167 158 130
S-Band 165 152 78
22
8
1.5
X-Band 179 122 45
11
6
1
40
9
4
0.1
V-Band 200 90
Tabular data
Nested, gridded data
And More Formats:
Algorithmic/Model Support and Output Data
And Even More: Five-D Data Visualization
The Final Additions to the set of
M&S Formats
•
•
•
•
•
•
•
•
•
•
Compact Terrain Data Base (proprietary)
DTED (product)
E&S GDF (proprietary)
E&S S1000 (proprietary)
GeoTIFF
Gridded raster
MultiGen (proprietary)
Shapefile (proprietary)
Terrex DART, Terra Vista (proprietary)
Vector Product Format (product)
“Atomic” Level Concepts
To facilitate precise rendering of translations to and
from the hub
Producers use their own coding systems, each of
which captures specific desired information—
some of which may be captured by others, and
some of which may be unique. Almost always
each producer carries information not available
from other sources. Extracting information
“imbedded” in definitions through explicit
statement of atomic attributes assists in adding
attributes without overwriting the object
The Value of
Atomic-Level Attributes: An Example
Entity: Bridge over river
Entity: Suspension bridge
Entity: Bridge for two-way traffic
Decomposed:
Bridge + located over water body = river
Bridge + bridge type = suspension
Bridge + traffic carried = vehicular + number of traffic directions = 2
Results in:
Bridge + located over water body = river
+ bridge type = suspension
+ traffic carried = vehicular
+ number of traffic directions = 2
(each of these attributes can be changed/updated as new information is
acquired)
“Complete and Accurate”—
Does That Mean Data Fusion?
• Is the COP affected by METOC conditions? If so, can those effects be
reflected in actual changes to the COP on the user system? This can be
handled internally to the system without requiring data fusion capability.
• Does the user need to derive useful or critical information from the
interaction of METOC/terrain data and information in the COP and
provide it to other systems? The answer to this question determines
whether data fusion is required by the user.
• Will the warfigher integrate environmental data into operational problems
or will he use them as map or other overlays? The answer to this
question determines whether data fusion is required by the user and
allowed by the producer.
• Does the user need to have the ability to update METOC conditions and
effects as reported by data from other (e.g., intel, foreign forces, etc.)
battlefield sources? The answer to this question determines whether
data fusion is required by the user.
•
What is the total set of requirements?
•
There are many processes and products involved (some of which, as in
ArcInfo/ArcView terrain products, may be proprietary)—but the exchange
mechanism must be independent of these. While we may know all of the currently
available sources, will there ever be new ones available to the warfighter?
•
Different views of the environment
Business
Summary: The Challenge of Data Fusion
– Air, land, sea, space
•
Lack of underlying environmental framework
– No integrated reference model available
• Representation (how the concept will be depicted on the user’s system—a
visual object? 2D or 3D? A data point? Background data for algorithm
use?)
• Naming/semantics
– Existing Data Models are conceptual, future models which are non-integrated
and don’t address current data repositories and data interchange requirements
Technical
– Spatial location and orientation (coordinate system and datum)
THE TRADITIONAL SOLUTION: Direct Mapping
Data
Producer 1
Data
Consumer
Application A
Data
Producer 2
Data
Consumer
Application B
Data
Producer 3
Data
Consumer
Application C
Data
Producer n
Data
Consumer
Application Z
RESULT: A BIAS AGAINST TRANSLATION SOFTWARE
A GIG-Oriented Solution:
The Interoperable “Middle Layer”
Data
Producer 1
Data
Consumer
Application A
Data
Producer 2
Data
Consumer
Application B
Data
Producer 3
Data
Producer n
COMMON
INTERCHANGE
HUB
Data
Consumer
Application C
Data
Consumer
Application Z
The Result of Improper Data Fusion
What works for one system…
creates unusual behaviors in another…
ASD(C3I) PDM-85
Why Not Let the Producers
Handle it All?
Project 2851
Standard Simulator
Database
Interchange Format
(SIF)
?
SIMNET
Database
Interchange
Specification
(SDIS)
ASD(C3I) PDM-85 directed DMA (NGA’s predecessor) to STOP producing
system-specific formats. Without some means of creating interoperable,
reusable data, billions of dollars of DoD investment in simulation and other
systems would have been lost.
SEDRIS: How it works
1.
2.
3.
4.
5.
Identify representation structure of original data object (point, vector, raster, etc.—geometry,
topology, grid, pixel, etc.) (this is the data format)
Separate attribution of the object (what it is, characteristics of what it is) from its
representation (this is the data content)
Determine georeferencing of the object (this is the location of each object in its original spatial
reference frame—UTM, MGRS, WGS-84, any local inertial or celestial reference datum, etc.)
Overlay representation on SEDRIS Data Representation Model, convert attribution to EDCS
codes, and decompose georeferencing using Spatial Reference Model
Reassemble objects from multiple sources using the SEDRIS Transmittal Format to
integrate/fuse data (more than just the simple overlay that is used in C4I, M&S systems now)
So Why Keep SEDRIS?
• SEDRIS is user-oriented. It opens up and reconciles
data from multiple producers for multiple users.
• SEDRIS is like any other standard for interoperability
– it “costs” resources to implement in any single system. It is
not useful for a standalone system
– It saves significant resources when used in more than one
system
• Assessment: “It is not in industry’s best interest to
use SEDRIS. It is absolutely essential that the
Government keep SEDRIS alive.”
BACKUP SLIDES
Formal Definitions of the
Normal Forms (1 of 2)
• 1st Normal Form (1NF)
– Def: A table (relation) is in 1NF if
– 1. There are no duplicated rows in the table.
– 2. Each cell is single-valued (i.e., there are no repeating groups or
arrays).
– 3. Entries in a column (attribute, field) are of the same kind.
• Note: The order of the rows is immaterial; the order of the columns is
immaterial.
• Note: The requirement that there be no duplicated rows in the table means
that the table has a key (although the key might be made up of more than
one column—even, possibly, of all the columns).
• 2nd Normal Form (2NF)
– Def: A table is in 2NF if it is in 1NF and if all non-key attributes are
dependent on all of the key.
• Note: Since a partial dependency occurs when a non-key attribute is
dependent on only a part of the (composite) key, the definition of 2NF is
sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial
dependencies."
• 3rd Normal Form (3NF)
– Def: A table is in 3NF if it is in 2NF and if it has no transitive
dependencies.
Formal Definitions of the
Normal Forms (2 of 2)
• Boyce-Codd Normal Form (BCNF)
– Def: A table is in BCNF if it is in 3NF and if every determinant is a
candidate key.
• 4th Normal Form (4NF)
– Def: A table is in 4NF if it is in BCNF and if it has no multi-valued
dependencies.
• 5th Normal Form (5NF)
– Def: A table is in 5NF, also called "Projection-Join Normal Form"
(PJNF), if it is in 4NF and if every join dependency in the table is a
consequence of the candidate keys of the table.
• Domain-Key Normal Form (DKNF)
– Def: A table is in DKNF if every constraint on the table is a logical
consequence of the definition of keys and domains.

Source: DATABASE-MANAGEMENT PRINCIPLES AND APPLICATIONS
Dr. Ronald E. Wyllys, The University of Texas at Austin, Austin, Texas, 78712-1276
http://www.gslis.utexas.edu/~l384k11w/normover.html