Transcript Document

GeoViQua: Advances in data quality disclosing

Ivette Serral

Center of Research in Ecology and Forestry Applications (CREAF) [email protected]

Foz d’Iguaçú, 21-23 November 2012

QUA lity aware VI sualisation for the G lobal E arth O bservation system of systems It’s an FP7 project devoted to show quality information embedded in GEOSS data (2011-2014) 10 partners, 7 countries

www.geoviqua.org

The problem

Foz d’Iguaçú, 21-23 November 2012

• •

GEOSS data is treated by means of the GEOSS Common Infrastructure (GCI)

Is there quality information in the GCI?

– – There is some in the form of ISO19115 DQ elements and lineage But.. not enough The GCI does not follow a

global model for quality

The GCI is shown and searchable on the GEO Portal

The GEOPortal search and results – – – are not ranged by quality quality indicators are not easily comparable spatially distributed uncertainty is not included

www.geoviqua.org

Community View

Foz d’Iguaçú, 21-23 November 2012

• • •

Data Quality?

Many researchers refer to the ‘ famous five ’ as the common criteria for evaluating spatial data quality: –

lineage

;

accuracy

.

completeness

;

consistency

;

positional accuracy

; and

attribute

Broad scientific acceptance of the common spatial quality elements does not apply to all cases for evaluation “ fitness-for-use ” – user requirements can go far beyond the widely accepted ‘famous five’.

We used semi-structured telephone and face-to-face interviews with a variety of geospatial data users and experts from a number of countries and application domains.

More information http://www.geoviqua.org/Docs/SubmittedDeliverables/D2_1_GeoViQua.pdf

at:

www.geoviqua.org

What about users?

Foz d’Iguaçú, 21-23 November 2012

• Users are exceedingly interested in good quality metadata records – And information that can help to assess fitness-for-use of the data • Users find metadata records typically incomplete – with essential data omitted The process of dataset discovery and selection is more difficult • Users are also interested in ‘soft’ knowledge – – – about data quality Data providers’ comments on the overall quality of a dataset, known data errors, potential data usage Peers’ reviews and recommendations (they contact their peers to obtain suggestions) Dataset provenance, citation and licensing information • Citation is incomplete (lack of valid producer contact details), and licensing often missing • Citation: users rely on data from good reputation producers • Currently, some of these cannot be recorded in standard metadata • Users need to easily and systematically compare metadata – records Side-by-side visualisation of all metadata elements would allow geospatial datasets to be compared more effectively, • especially when datasets are very similar and differences are hard to distinguish

www.geoviqua.org

Quality model is much more than positional accuracy

Foz d’Iguaçú, 21-23 November 2012

• • There are many quantifiable aspects that can be recorded: – Consistency, completeness, positional, thematic and temporal accuracy… There are many qualitative aspects that are needed: – Lineage (traceability), scientific papers, user feedback, data usage…

www.geoviqua.org

GeoViQua Data model treats statistical uncertainties

Foz d’Iguaçú, 21-23 November 2012

m Value of the vertical DEM accuracy m

http://www.uncertml.org/distributions/normal 1.2 3.6

lDistribution> ”> Explicit recognition that errors acceptably fit a Normal distribution with mean 1.2 • An overall positive bias was observed • A difficult feature to convey by traditional means)

www.geoviqua.org

Two models on data quality are needed

Foz d’Iguaçú, 21-23 November 2012

• Producer’s quality metadata – In the producers metadata records – Encoded in the classical ISO 19115/19139 – – Some extensions required Stored in the current catalogues (GEOSS Clearinghouse, etc) • User’s quality metadata – In independent metadata repositories – – – Linked to producer’s metadata by id Future component of the GCI?

Contains comments , “like it”, star rates, etc

www.geoviqua.org

Advances in quality models: GVQ producer quality model

Foz d’Iguaçú, 21-23 November 2012 http://schemas.geoviqua.org/GVQ/3.1.0

www.geoviqua.org

1.

2.

3.

4.

Advances in quality models: GVQ producer quality model

Foz d’Iguaçú, 21-23 November 2012 Publications

. Based on ISO 19115 CI_Citation and extended with ISO 690 metadata document. An existing DQ_ or MD_ element is extended to allow a ‘ elements. Added to a number of quality elements within the referenceDoc ’ to be added.

Discovered issues

the DQ_DataQuality . Added discovered issue class (e.g., a problem which the producer has identified during generation of a dataset) to element.

Reference datasets

used for evaluation. Added to ‘ dataEvaluation ’ section of the 19157 to allow recording the reference dataset used to assess the quality indicator.

Traceability

. Added a new ‘metaquality’ type to allow the lineage of a data quality principles.

assessment to be recorded, along with its representativity and coverage. This is a requirement of the QA4EO

More information: Lucy Bastin [[email protected]] & a poster in this session room

www.geoviqua.org

Advances in quality models: GVQ user quality model

Foz d’Iguaçú, 21-23 November 2012 class Feedback model http://schemas.geoviqua.org/GVQ/3.1.0

GVQ_FeedbackTarget

0 0..1

+ parent :GVQ_FeedbackTarget - target :string «XSDelement» + natureOfTarget :MD_ScopeCode A reply points to some other feedback item, but they require IDs for implementation purposes anyway.

«abstract»

GVQ_FeedbackFocusType GVQ_FeedbackFocus

+ item :GVQ_FeedbackItem* 1 1 +secondaryFoci 1 0..* 0..* «abstract»

GVQ_FeedbackItem

«id» - identifier :string +items 1..* The target reference identifies the "hard" discussion context. The most common case would be a data set or a sensor service. It unambiguously refers to a thing pre-existing in the domain of discourse - a user cannot freely create a feedback target.

The feedback focus is intended to qualify a "narrow" discussion context similar to a discussion thread. The "narrow" context is always within one "hard" context.The user may create (some types of) feedback focuses.

[The FB Focuses attributes are considered examples] Together, target and focus constitute the

subject

of a given feedback item.

GVQ_ThematicFocus

+ title :string

GVQ_DatacentricFocus

- layer :string + extent :EX_SpatialTemporalExtent - band :string

GVQ_TagFocus

+ tags :string [0..*]

GVQ_UserComment

- - comment :String mime-type :String = text/plain

GVQ_ExternalFeedback

- - resourceURL :String mime :String

GVQ_GeoLabel GVQ_UsageReport

+ usagePurpose :GVQ_ReportAspectCode [0..*] + Citation :CI_Citation [0..1] + usageDescription :string «XSDelement» + alternativeDatasets :MD_Identifier [0..-1] «enumeration»

GVQ_ReportAspectCode

Useage = Useage Problem = Problem FitnessForPurpose = Fitness for Purpose Alternatives = Alternatives «enumeration»

GVQ_UserRoleCodeEnum

CommercialDataProducer = Commercial Data...

ResearchEndUser = Research End-User NonResearchEndUser = Non-research En...

ScientificDataProducer = Scientific Data...

GVQ_DomainFocus

- applicationDomainURN :string

GVQ_UserInformation

+ user :CI_ResponsibleParty [0..1] + applicationDomain :string [0..*] {ordered} + expertiseLevel :int 1

GVQ_Rating

+ ratingValue :int 1

GVQ_FeedbackGroup

0..* - - - timestamp :CI_Date user :GVQ_UserInformation roles :GVQ_UserRoleCodeEnum [1..*]

GVQ_MetadataOv erride

+ alternativeDataQualityEstimate :DQ_DataQuality

www.geoviqua.org

Advances in quality models: GVQ user quality model

Foz d’Iguaçú, 21-23 November 2012

ISO 19115 only provides the MD_Usage to report how users apply the dataset in their activities. This is insufficient for the GEOSS needs. GeoViQua has elaborated this model from scratch.

• • • • • • • A user can submit a A user comment .

GVQ_FeedbackItem A rating mark.

A usage report supported by a A link to external feedback document, etc).

citation in a form of: of a report.

(blog pages, Google docs A metadata override A quality label ( that amends a producer metadata value.

GEO Label ).

These items are related to a dataset through an identifier .

More information: Lucy Bastin [[email protected]] & a poster in this session room

www.geoviqua.org

• •

Advances in quality models: GVQ user quality model

Foz d’Iguaçú, 21-23 November 2012

The GeoViQua Quality Model is explained in the Practice Twiki : http://wiki.ieee-earth.org/GEOSS_Tutorials GEOSS Best It has been presented in the AIP5 session and it the GEOSS Standards and Interoperability Forum i’s a contribution to (SIF).

More information: Anna Riverola [[email protected]]

www.geoviqua.org

Advances in visualizing metadata quality information

Foz d’Iguaçú, 21-23 November 2012

• • • • •

GeoViQua has developed the NOAA former’s version Q-Rubric tool, an extension on the

An XSLT tool that convert punctuation page.

XML metadata files into an HTML Analyses every ISO quality metadata information and rates it by presence/absence ( attributing one point when metadata exists, but not penalizing if information is missing ).

Help users to evaluate how many metadata elements related to data quality are provided.

Adds two new information groups related to ISO quality: and Usage .

Quality GEOSS representation style has been applied to the original Rubric tables.

www.geoviqua.org

Advances in visualizing metadata quality information

Foz d’Iguaçú, 21-23 November 2012

Download it:

http://www.geoviqua.org/docs/isoRubricQHTML.xsl

• Some results from the GCI: – 97203 metadata records held in the Clearinghouse; 96867 analysed – – – 14.79% non defining mandatory topic category 80.63% do not have any quality element (of any class) Quality : Positional accuracy is the most populated class with 37.77% documented. 36.06% of completeness and 18.79% of logical consistency.

Only 0.50% regards to thematic accuracy.

– – Lineage : 35.27% do not have any lineage sub-element defined.

Usage : 0.60% of elements documented.

• – – Conclusions: Metadata providers do not comply with the ISO Core Mandatory . Many topic categories present just a 75% of completeness.

This impacts metadata search engines for data discovery requests.

More information: Alaitz Zabala [[email protected]]

www.geoviqua.org

Advances in visualizing quality information I

Foz d’Iguaçú, 21-23 November 2012 Integrating UncertWeb project proposals: Use NetCDF-U

The Network Common Data Form ( NetCDF ) is one of the primary methods of beyond.

self documenting data storage and access in the international geosciences research and education community and NetCDF-U Conventions are used to formally qualify the uncertainty information in geospatial data encoded in the netCDF-3 format, by means of concepts from the UncertML best practice of the UncertWeb project NetCDF-U Conventions are designed to be fully compatible with the netCDF Climate and Forecast Conventions , the de-facto standard for a large amount of data in the Fluid Earth Science community.

It is now a discussion paper in OGC.

www.geoviqua.org

Advances in visualizing quality information I

Foz d’Iguaçú, 21-23 November 2012

• Many data involved in the GeoViQua scenarios are encoded in NetCDF.

• • An open source format file.

Gives strength and freedom to encode metadata.

 GeoViQua is developing tools for reading and writing NetCDF-U files and import/export from/to other raster formats.

NetCDF file opened with the NASA software Panoply NetCDF file exported to IMG file and opened with the new tool

More information: Victor Zaldo [[email protected]]

www.geoviqua.org

• • • • • • •

Advances in visualizing quality information II

Foz d’Iguaçú, 21-23 November 2012 Integration of Quality Information with OGC Web Map Service: WMS-Q

The WMS 1.3.0 currently does not well support the information into WMS.

integration of quality The current WMS does not support how data layer can with the corresponding uncertainty layers.

semantically associate WMS-Q specification is proposed as far as possible within the bounds of the WMS 1.3.0 specification, requiring as few extensions as possible.

To integrate the dataset-level quality information into the WMS, we propose to expand slightly “Type” attribute of “MetadataURL” element to have “unstructured” and “other-structured” options.

Propose to add a “description” element for the “MetadataURL” element.

Pixel-level uncertainty information can be encoded using NetCDF Uncertainty Conventions ( NetCDF-U ).

Work tested in the OGC interoperability experiment OWS-9

More information: Jon Blower [[email protected]]

www.geoviqua.org

Advances in visualizing quality information III

Foz d’Iguaçú, 21-23 November 2012

• •

Preliminary results from experiments with colour coding :

Quality should be intercomparable - i.e. the saturation should be intuitively comparable even across hues/categories. Perceptual colour models make this possible.

Hue represents category, and saturation represents the "Purity for the parcel enrichment" (in percent) or the certainty .

Nearly uncertain in both campaigns Gain in certainty 22.03.07

16.12.2006

More information: Simon Thum [[email protected]]

www.geoviqua.org

Advances in visualizing quality information IV

Foz d’Iguaçú, 21-23 November 2012 Creation of a “ Carbon Atlas ” portal

Combining the possibilities of web mapping with the comparison of models including uncertainty: combination of ncWMS (server) and OpenLayers (client): 1. Possibility to compare models between them:

ncWMS: Web Map Service for geospatial data that are stored in CF-compliant NetCDF files ( developed and maintained by the Reading e-Science Centre ) www.geoviqua.org

Advances in visualizing quality information IV

Foz d’Iguaçú, 21-23 November 2012

2. Creation of Comparison map (based on method): colour pixel IPCC’s visualization = difference between models, patterns = % on how models agree.

Need to add to the ncWMS server the possibility to associate pattern/raster.

More information: Pascal Evano [[email protected]]

www.geoviqua.org

Advances in applied scenarios II

Foz d’Iguaçú, 21-23 November 2012

• •

Uncertainty variables assessment for continuous and categorical

Continuous variables : uncertainty related to citizens meteo data in relation to the official Metoffice ones.

[[email protected]] More information: Dan Cornford

Categorical variables : spatialized quality indicators coming from a satellite image classification. Global, local and pixel uncertainty level.

Several statistical classification methods are used.

Sevillano [[email protected]] More information: Eva

Cat1-Classification Probability of success (%) Cat2 Cat3 www.geoviqua.org

Fidelity

Advances in including data quality in search

Foz d’Iguaçú, 21-23 November 2012

• Quality search integrated in the EuroGEOSS Discovery and Access Broker to be applied to the GEO Portal.

www.geoviqua.org

Advances in including data quality in search

Foz d’Iguaçú, 21-23 November 2012

• Retrieve quality information embedded in Metadata

More information: Lorenzo Bigagli [[email protected]]

www.geoviqua.org

Advances in labelling the quality: the GEO Label

• • •

Foz d’Iguaçú, 21-23 November 2012 What is it?

– The GEO Label is intended to “

assist the user to assess the scientific relevance, quality, acceptance and societal needs of the components

” (ST-09-02 Task Team, 2010).

Purposes ?

– – – be a

quality indicator improve user recognition and trust assist in searching

relevance; and for GEOSS geospatial data and datasets; in datasets that carry a GEO label; by providing users with visual clues of dataset quality and –

increase visibility

of EO data.

GEO label development:

– The GeoViQua project is currently undertaking research to define and evaluate the concept of a GEO label.

– The development is carried out in three phases:

www.geoviqua.org

Advances in labelling the quality: the GEO Label

Foz d’Iguaçú, 21-23 November 2012

• • •

Phase I Study:

– Overall, GEO label questionnaire results show that users and producers agree on the benefits of introducing a GEO label , with no distinct difference between user and producer views.

– The majority of respondents support an the key GEO label function.

all-in-one drill-down interrogation facility as

Phase II Study :

– The GEO labels will be a graphical representation generated individually for each dataset in the GEOSS (or other data portals and clearinghouses) based on the quality information that is available for that dataset.

– Second online questionnaire-based survey to

identify the designs that convey quality information to users in most efficient and comprehensible way

.

Currently:

– At this stage we are analysing the GEO label study II results to fully define and establish a GEO label that meets the needs of the geodata user community .

– Phase III: we will create study.

physical prototypes which will be used in a human subject

More information: Victoria Lush [[email protected]]

www.geoviqua.org

The future

Foz d’Iguaçú, 21-23 November 2012

• • Many possibilities has been shown.

Now the project enters in a development phase where the concepts exposed and prototypes need to be developed.

• • • Move the GeoViQua Quality Model Develop a user feedback for a broader adoption.

system prototype.

Test search and visualization developments in a GEO Portal replica (ESA contribution) • Work with the Architecture GEO committees to move some of this contribution for adoption in the GCI .

www.geoviqua.org

GeoViQua: Advances in data quality disclosing

Thanks! Ivette Serral

Center of Research in Ecology and Forestry Applications (CREAF) [email protected]