. Toward Digital Government: The Case of Government Statistics Gary Marchionini University of North Carolina at Chapel Hill www.ils.unc.edu/govstat NSF Grants EIA 0131824 and EIA 0129978 Principal.

Download Report

Transcript . Toward Digital Government: The Case of Government Statistics Gary Marchionini University of North Carolina at Chapel Hill www.ils.unc.edu/govstat NSF Grants EIA 0131824 and EIA 0129978 Principal.

Toward Digital Government: The Case of Government Statistics

Gary Marchionini University of North Carolina at Chapel Hill www.ils.unc.edu/govstat NSF Grants EIA 0131824 and EIA 0129978 Principal Investigators: Gary Marchionini, Stephanie Haas, Ben Shneiderman, Catherine Plaisant, and Carol Hert .

gov

Digital Government: Leveraging IT

.

gov • Government information dissemination – Websites – Other publications (no mass emailings yet) • Transactions – Registrations – Census, regulatory filings – Taxes • Policy making – E-voting – E-rules • Our work focuses on statistical information and agencies as many important decisions by policy makers and citizens depend on statistics

Preliminary Work 1996-2000

• Human needs – Interviews (agencies, public) – Transaction log analysis – Email content analysis • System development and testing – Novel interfaces – Information architecture – Usability studies .

gov

Focus on Tables 1998-2000

• Table browser – Java applet – DTD for tables (DC and DDI influence) – XML protocol – Mapping metadata elements to interface control mechanisms – Piping data from large databases to applet – User studies • Metadata to aid understanding .

gov

Statistical Knowledge Network 2003-2006

• Create SKN prototype with agency partners • Integration – Horizontal integration across federal agencies (BLS, EIA, NCHS, Census, SSA, NASS) – Vertical integration from local/state • Focus on non-specialists – Help crucial – Metadata drives help • User interfaces are the intermediaries to link people and data • Find what you need, understand what you find .

gov

Data Flow

agency data with integrated metadata Statistical Ontology Domain Ontologies agency with multiple metadata repositories agency backend data and metadata membrane agency backend data and metadata

Distributed Public Intermediary:

variable/concept level, XML-based incorporating ISO 11179 and DDI providing java-based statistical literacy tools to firewall Domain Experts user interfaces End User Communities end user end user end user end user end user

end users

: interact with data from information/concept perspective, not agency perspective end user end user .

gov

Statistical Knowledge Network Architecture SKN Consortium Agencies ………….

SKN Registry Objects

Reports metadata Tables metadata People metadata Glossary Annotations

Actions

Contribute Find Display Annotate Understand Manipulate Collaborate

Ontology Rules & Constraints Private Work Space

Objects Actions

…..

Private Work Space

Objects Actions

…..

Private Work Space

Objects Actions .

.

gov

Interface Prototypes:

Find, Display, Understand; Leverage Metadata, Glossary, Ontology

• Relation Browser • Mulitlayered help: treemaps, video help • Animated Glossary • Contextualizer • PairTrees • Spatial audio for maps • Missing Data .

gov

Use Case Scenarios to Guide Design

• Based on discussions with agency partners • 20 scenarios • 4 detailed with in depth resources located • Used to ground ongoing work .

gov

Relation Browser++ displaying all webpages EIA

.

gov

RB++ with Cursor Over Residential

.

gov

Sector

RB++ showing ‘hous’ typed in title

.

gov

field

1 level

Multi-layered interfaces

3 levels of growing complexity .

gov map+table +filters +scatterplot map+table map+table +filters map+table +filters +scatterplot

Animated Demonstration Features

.

gov

Script Guidelines

• Base the script on a live demonstration (never on a written description) – Focus on tasks (not tours of widgets or conceptual overviews) – Act out the interaction (with minimum description) then describe results in context of task – Start with a tour of main screen components (orient and introduce vocabulary) 5-10 sec. max – Plan a linear sequences made of very short autonomous chunks (15-60 sec.) • Map the chunks to existing online documentation • Show text title at beginning of each chunk • Carefully synchronize voice and visual (hard when alone) • Provide duration and file size for individual chunk .

gov

Interactive Glossary Development

.

gov

Tools

• Provide foundation for content development • Separate content development from presentation development • Reduce overall development time • Maximize reuse of existing elements • Create multiple presentations from a single content development effort

Animation Template

.

gov

Content Foundation Template (

SIG)

Question

initial motivation

Answer

overview, definition

Process

explanation, equation

Result

statistic, answer

Example Review

summary, interpretation

.

gov

Animation Template

• Consistent display and interaction for all animations • Presents animation and explanatory text simultaneously • Navigate (forward and back) through animation segments • Complete review of text at any time .

gov

Animation Template

• Three pieces: text, animations, template • Text is tagged with content section tags in a separate text file • Animation consists of segments in individual animation files • Text and animation segments coordinated by placement in template .

gov

ontology Semantic level •Classes •Relationships •Constraint rules modeling DTD/XML Schema implementation Structural level •Elements •Attributes •Datatypes

SKN

Ontology

DTD / XML Schema

Interface Tools

Statistical Interactive Glossary (SIG)

.

gov

Ontology Applications

 Knowledge organization  Content and terminology control  Data integration  Query support  Automatic classification support  Reasoning mechanism  Others

aged unit benefit age

Domain knowledge .

gov

estimate salary earning poverty estimate household family wage poverty income distribution

Operational knowledge

aged unit

married couples living together, with husband or wife aged 65 or older

SSA Census Bureau FIFARS

Project DTD

• Investigate DDI and ISO 11179 • Leverage DDI and data cubes • Markup a set of objects – Tables – Reports/press releases • Use markup to build added value search (find what you need) and help (understand what you find) support into interfaces .

gov

The Basic Structure

docDscr

: description of the markup-what is being marked-up, who marked it up, etc.

entDscr_1:

description of an entity within the marked up document

varDscr_1

: description of each variable within an entity, study group or document

varDscr_2

: description of each variable within an entity, study group or document

entDscr_2:

description of an entity within the marked up document

stdygrpDscr

: describes the “group” to which an entity or document belongs such as a survey program

nCubeDscr

: used when entity is an aggregated table

fileDscr

: descripes physical file structures for nCubes .

gov

One Example of How the DTD

.

gov

Helps

The DTD can help bring the “expert knowledge” to the less expert user and bring relevant information together by enabling searching via variables as well as subjects/keywords

Median income, by age , 2001 .

gov additivity="" temporal="no" geog="no" geoVocab="" catQnty="4"> age persons

< labl source =" producer " level =" catgryGrp "> Age

1

< labl source =" producer " level =" catgry "> 65-69

2

< labl source =" producer " level =" catgry "> 70-74

3

< labl source =" producer " level =" catgry "> 75-79

4

< labl source =" producer " level =" catgry "> 80 or older

Discovering Metadata

• Hybrid machine learning approach – Crawl website – Create term document matrices – Use k-means clustering with small K to fit on screen in RB++ – Revise • Use structure in the existing sites to train a classifier • For small n of concepts, classify site .

gov

Combining Machine Learning and What should these Dynamic Interfaces topics be, and how do we know if we’ve found the right names for them?

.

gov

Combining Machine Learning and Dynamic Interfaces .

gov How do we assign thousands of documents to their respective topics?

doc doc

Initial, Unstructured Approach

doc doc doc doc doc doc doc doc doc doc doc doc .

gov

doc doc

Initial, Unstructured Approach

doc doc doc doc doc doc doc doc doc doc doc doc .

gov

Initial, Unstructured Approach

doc doc doc doc

Distributions

doc This approach yielded intuitively coherent doc doc portions of the data.

doc doc doc doc .

gov

New Approach, Semi-Supervised

.

gov

New Approach, Semi-Supervised

doc doc doc doc doc doc .

gov doc doc doc doc doc doc doc doc doc doc

New Approach, Semi-Supervised

doc doc doc doc doc doc .

gov doc doc doc doc doc doc This approach capitalizes on the agencies’ efforts and expertise, and so far seems to yield superior results. However, the amount of training data is very sparse, and the observed categories have high correlation in some cases.

Our current work addresses these tuning issues.

Collection agents Vertical Integration: Agriculture USDA / NASS Farmers & Producers Supply data to agencies State Statistical Office State Cooperative Agency (Dept.

of Agriculture,etc.) Statistical Consumers Obtain data from agencies .

gov

Multiple Research Threads for the

.

gov

SKN

• Interfaces • Metadata and Ontology • Multi-leveled help • Automatic slicing and dicing • User needs and user testing • Cross agency cooperation • See www.ils.unc.edu/govstat