Information management, workflow and discovery /check-in for project definitions Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013

Download Report

Transcript Information management, workflow and discovery /check-in for project definitions Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013

Information management, workflow and discovery /check-in for project definitions

Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 1

Review of reading

• • • • Information Integration – – – – Social issues in information discovery and sharing Information integration in geo-informatics http://cseweb.ucsd.edu/~goguen/projs/data.html

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839387/ Information Life Cycle – – – – MSDN Information Life Cycle Information Life Cycle definition and context http://www.computerworld.com/s/article/79885/The_new_buzzwords_Information_lifecycle_management http://www.databasejournal.com/sqletc/article.php/3340301/Database-Archiving-A-Critical-Component-of-Information Lifecycle-Management.htm

– – http://en.wikipedia.org/wiki/Information_Lifecycle_Management http://msdn.microsoft.com/en-us/library/bb288451.aspx

Information Visualization – http://mastersofmedia.hum.uva.nl/2011/04/18/the-simple-ways-of-information-visualization/comment-page-1/ – – http://www.siggraph.org/education/materials/HyperVis/domik/folien.html

http://www.visual-literacy.org/periodic_table/periodic_table.html

Information model development and visualization – http://www.acm.org/crossroads/xrds7-3/smeva.html

• Outside the current box – Peter Fox and James Hendler, 2011, Changing the Equation on Scientific Data Visualization, Science, Vol. 331 no. 6018 pp. 705-708, DOI: 10.1126/science.1197654 online at http://www.sciencemag.org/content/331/6018/705.full or see: http://escience.rpi.edu/publications/visualization/fox_hendler_science2011.html

2

Logical Collections

• The primary goal of a Management system is to abstract the physical collection into logical collections. The resulting view is a

uniform homogeneous

collection.

• Note the analogy with logical models and information integration: so EARLY ON – Identifying naming conventions and organization – Aligning cataloguing and naming to facilitate search, access, use (who uses?) – Provision of **contextual** information 3

Physical Handling

• Map between physical and logical. • Where and who does it come from?

– Is there a transfer into a physical form?

– Is it backed-up, archived, cached? … – What formats?

– Naming conventions – do they change?

• Note analogy to physical models 4

Interoperability Support

5

Security

• Access authorization and change verification. This is the basis of trusting your information.

6

Ownership

• Who is responsible for quality and meaning 7

Metadata

• Recall metadata are data about data.

• Metainformation?

8

Persistence

• Deployment of mechanisms to counteract technology obsolescence.

9

Discovery

• Ability to identify useful relations and information inside the collection • More on this later in this class 10

Dissemination

• Mechanisms to make aware the interested parties of changes and additions to the collections.

• Do you rely on information retrieval? The Web?

11

Summary of Information Management

• Creation of logical collections • Physical handling • Interoperability support • Security support • Ownership • Metadata collection, management and access.

• Persistence • Knowledge and information discovery • Dissemination and publication 12

Note for your project writeup!

• Information management! Cover the 9 areas.

13

Information Workflow

• What is a workflow?

• Why would you use it?

• Key considerations for information, cf. data • Some pointers to workflow systems 14

What is a workflow?

• General definition: “series of tasks performed to produce a final outcome ” (taxes?) • Information workflow – involves people but potentially want to – Automate jobs that a person traditionally performed manually – Process large volumes of information faster than one could do by hand • NB difference from data workflows – it reaches out to encompass the user (e.g. ‘unrecorded actions’) 15

Background: Business Workflows

• Example: planning a trip • Need to perform a series of tasks: book a flight, reserve a hotel room, arrange for a rental car, etc.

• Each task may depend on outcome of previous task – Days you reserve the hotel depend on days of the flight – If hotel has shuttle service, may not need to rent a car • Prior information, experience, preferences… 16

Tripit.com?

17

What about information workflows?

• Perform a set of transformations/ operations on information source(s) • Examples – Generating images from raw data – Identifying areas of interest from a large information source (e.g. word cloud) – Classifying a set of objects – Querying a web service for more information on a set of objects – Many others… 18

More on Workflows

• Can process many information types: – Archives – Web pages – Streaming/ real time – Images – Semiotic systems • Robust workflows depending on formal (concept and logical) models of the flow of information among components • May be simple and linear or very complex 19

Challenges

• Questions: – What are some challenges for users in implementing workflows?

– What are some challenges to executing these workflows?

– What are limitations of writing a program?

• Mastering a programming language • Visualizing workflow • Sharing/exchanging workflow • Formatting issues • Locating datasets, services, or functions 20

Workflow Management Systems

21

Benefits of Workflows

• Documentation of aspects of analysis • Visual communication of analytical steps • Ease of testing/debugging • Reproducibility • Reuse of part or all of workflow in a different project 22

Additional Benefits

• Integration of and between multiple computing environments • ‘Automated’ access to distributed resources via other architectural components, e.g. web services and Grid technologies • System functionality to assist with information integration of heterogeneous components and source 23

Why not just use a script?

• Script does not specify low-level task scheduling and communication • May be platform dependent • Can’t be easily reused • May not have sufficient documentation to be adapted for another purpose 24

Why can a GUI be useful?

• No need to learn a programming language • Visual representation of what workflow does • Allows you to monitor workflow execution • Enables user interaction (though not necessarily collaboration) • Facilitates sharing of workflows 25

Some

workflow systems

• Kepler • SCIRun • Sciflo • Triana • Taverna • Pegasus • • Some commercial tools: – Windows Workflow Foundation – Mac OS X Automator http://www.isi.edu/~gil/AAAI08TutorialSlides/5-Survey.pdf

• http://www.isi.edu/~gil/AAAI08TutorialSlides/ • See reading for this week 26

Discovery

• How does someone find your information?

• How would you provide discovery of – collections – files – ‘bits’ • How would you find -> 27

Discovery

o Search (Federated Search) o Helped by o Folksonomies (user contributed) o Intelligent Agents o Search Engines o Taxonomies o Find photos of Kim o Boy or girl?

28

Use cases

• Find a sound recording of a swallow.

• Excuse me?

29

Use cases

• Find a sound recording of an

African

Swallow • Find a sound recording of a bird that sounds like an African Swallow • Media types – how can you discover them?

30

Use cases

• Find the movie that Jean Tripplehorn first starred in/ that was her most successful/ was lead actress?

• Has anyone gene sequenced a mouse?

• Find images of primary productivity in the North Atlantic • Discovery can often involve information integration (or is it *almost always*?) 31

Three level

metadata

DATA solution for

Data Discovery Data Integration Level 1:

Data Registration at the Discovery Level, e.g. Volcano location and activity

Level 2:

Data Registration at the Inventory Level, e.g. list of datasets, times, products

Level 3:

Data Registration at the Item Detail Level, e.g. access to individual quantities

Earth Sciences Virtual Database

A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration 32

Ontology based Data Integration

Using scientific workflows A.K.Sinha, Virginia Tech, 2006

Three level

metadata

solution?

Information Information Discovery Integration Level 1:

Registration at the Discovery Level, e.g. Find the upper level entry point to a source

Level 2:

Registration at the Inventory Level, e.g. list of datasets, using the logical organization

Level 3:

Registration at the Item Detail Level, i.e. annotation e.g. tagging

Catalog/ Index

Schema based integration 33

Integration

using mapping management A.K.Sinha, Virginia Tech, 2006

Information discovery

• What makes discovery work?

– Metadata – Logical organization – Attention to the fact that someone would want to discover it – It turns out that file types are a key enabler or inhibitor to discovery – Result ranking using *tuned* algorithm • What does not work?

– Result ranking algorithms that depend on unconventional information types (icon, index, symbol) 34

Federated search

• “is the simultaneous search of multiple online databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.

” wikipedia • Libraries have been doing this for a long time (Z39.50, ISO23950) • Key is consistent search metadata fields (keywords) • E.g. Geospatial One Stop http://www.geodata.gov

35

Smart search

• Semantically aware search, e.g. http://noesis.itsc.uah.edu

, http://eie.cos.gmu.edu

(Water -> Semantic Search) • Faceted search, e.g. mspace ( http://mspace.fm

), exhibit (MIT), S2S (RPI; http://aquarius.tw.rpi.edu/s2s ) 36

NOESIS

37

Faceted search

logd.tw.rpi.edu

38

Summary - discovery

• Useful to write a few discovery use cases to drive how your design is developed • Evolution of your role in facilitating discovery and what/ how others implement access to your information 39

Reading for this week

• Is retrospective 40

Check in for Project Assignment

• Analysis of existing information system content and architecture, critique, redesign and prototype redeployment • Or a new use case, development, etc.

41

What is next

• April 16 – Information Audit • April 23 – • April 30 – • May 6 – final project presentations 42