Information management, workflow and discovery /check-in for project definitions Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013
Download ReportTranscript Information management, workflow and discovery /check-in for project definitions Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013
Information management, workflow and discovery /check-in for project definitions
Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 1
Review of reading
• • • • Information Integration – – – – Social issues in information discovery and sharing Information integration in geo-informatics http://cseweb.ucsd.edu/~goguen/projs/data.html
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839387/ Information Life Cycle – – – – MSDN Information Life Cycle Information Life Cycle definition and context http://www.computerworld.com/s/article/79885/The_new_buzzwords_Information_lifecycle_management http://www.databasejournal.com/sqletc/article.php/3340301/Database-Archiving-A-Critical-Component-of-Information Lifecycle-Management.htm
– – http://en.wikipedia.org/wiki/Information_Lifecycle_Management http://msdn.microsoft.com/en-us/library/bb288451.aspx
Information Visualization – http://mastersofmedia.hum.uva.nl/2011/04/18/the-simple-ways-of-information-visualization/comment-page-1/ – – http://www.siggraph.org/education/materials/HyperVis/domik/folien.html
http://www.visual-literacy.org/periodic_table/periodic_table.html
Information model development and visualization – http://www.acm.org/crossroads/xrds7-3/smeva.html
• Outside the current box – Peter Fox and James Hendler, 2011, Changing the Equation on Scientific Data Visualization, Science, Vol. 331 no. 6018 pp. 705-708, DOI: 10.1126/science.1197654 online at http://www.sciencemag.org/content/331/6018/705.full or see: http://escience.rpi.edu/publications/visualization/fox_hendler_science2011.html
2
Logical Collections
• The primary goal of a Management system is to abstract the physical collection into logical collections. The resulting view is a
uniform homogeneous
collection.
• Note the analogy with logical models and information integration: so EARLY ON – Identifying naming conventions and organization – Aligning cataloguing and naming to facilitate search, access, use (who uses?) – Provision of **contextual** information 3
Physical Handling
• Map between physical and logical. • Where and who does it come from?
– Is there a transfer into a physical form?
– Is it backed-up, archived, cached? … – What formats?
– Naming conventions – do they change?
• Note analogy to physical models 4
Interoperability Support
5
Security
• Access authorization and change verification. This is the basis of trusting your information.
6
Ownership
• Who is responsible for quality and meaning 7
Metadata
• Recall metadata are data about data.
• Metainformation?
8
Persistence
• Deployment of mechanisms to counteract technology obsolescence.
9
Discovery
• Ability to identify useful relations and information inside the collection • More on this later in this class 10
Dissemination
• Mechanisms to make aware the interested parties of changes and additions to the collections.
• Do you rely on information retrieval? The Web?
11
Summary of Information Management
• Creation of logical collections • Physical handling • Interoperability support • Security support • Ownership • Metadata collection, management and access.
• Persistence • Knowledge and information discovery • Dissemination and publication 12
Note for your project writeup!
• Information management! Cover the 9 areas.
13
Information Workflow
• What is a workflow?
• Why would you use it?
• Key considerations for information, cf. data • Some pointers to workflow systems 14
What is a workflow?
• General definition: “series of tasks performed to produce a final outcome ” (taxes?) • Information workflow – involves people but potentially want to – Automate jobs that a person traditionally performed manually – Process large volumes of information faster than one could do by hand • NB difference from data workflows – it reaches out to encompass the user (e.g. ‘unrecorded actions’) 15
Background: Business Workflows
• Example: planning a trip • Need to perform a series of tasks: book a flight, reserve a hotel room, arrange for a rental car, etc.
• Each task may depend on outcome of previous task – Days you reserve the hotel depend on days of the flight – If hotel has shuttle service, may not need to rent a car • Prior information, experience, preferences… 16
Tripit.com?
17
What about information workflows?
• Perform a set of transformations/ operations on information source(s) • Examples – Generating images from raw data – Identifying areas of interest from a large information source (e.g. word cloud) – Classifying a set of objects – Querying a web service for more information on a set of objects – Many others… 18
More on Workflows
• Can process many information types: – Archives – Web pages – Streaming/ real time – Images – Semiotic systems • Robust workflows depending on formal (concept and logical) models of the flow of information among components • May be simple and linear or very complex 19
Challenges
• Questions: – What are some challenges for users in implementing workflows?
– What are some challenges to executing these workflows?
– What are limitations of writing a program?
• Mastering a programming language • Visualizing workflow • Sharing/exchanging workflow • Formatting issues • Locating datasets, services, or functions 20
Workflow Management Systems
21
Benefits of Workflows
• Documentation of aspects of analysis • Visual communication of analytical steps • Ease of testing/debugging • Reproducibility • Reuse of part or all of workflow in a different project 22
Additional Benefits
• Integration of and between multiple computing environments • ‘Automated’ access to distributed resources via other architectural components, e.g. web services and Grid technologies • System functionality to assist with information integration of heterogeneous components and source 23
Why not just use a script?
• Script does not specify low-level task scheduling and communication • May be platform dependent • Can’t be easily reused • May not have sufficient documentation to be adapted for another purpose 24
Why can a GUI be useful?
• No need to learn a programming language • Visual representation of what workflow does • Allows you to monitor workflow execution • Enables user interaction (though not necessarily collaboration) • Facilitates sharing of workflows 25
Some
workflow systems
• Kepler • SCIRun • Sciflo • Triana • Taverna • Pegasus • • Some commercial tools: – Windows Workflow Foundation – Mac OS X Automator http://www.isi.edu/~gil/AAAI08TutorialSlides/5-Survey.pdf
• http://www.isi.edu/~gil/AAAI08TutorialSlides/ • See reading for this week 26
Discovery
• How does someone find your information?
• How would you provide discovery of – collections – files – ‘bits’ • How would you find -> 27
Discovery
o Search (Federated Search) o Helped by o Folksonomies (user contributed) o Intelligent Agents o Search Engines o Taxonomies o Find photos of Kim o Boy or girl?
28
Use cases
• Find a sound recording of a swallow.
• Excuse me?
29
Use cases
• Find a sound recording of an
African
Swallow • Find a sound recording of a bird that sounds like an African Swallow • Media types – how can you discover them?
30
Use cases
• Find the movie that Jean Tripplehorn first starred in/ that was her most successful/ was lead actress?
• Has anyone gene sequenced a mouse?
• Find images of primary productivity in the North Atlantic • Discovery can often involve information integration (or is it *almost always*?) 31
Three level
‘
metadata
’
DATA solution for
Data Discovery Data Integration Level 1:
Data Registration at the Discovery Level, e.g. Volcano location and activity
Level 2:
Data Registration at the Inventory Level, e.g. list of datasets, times, products
Level 3:
Data Registration at the Item Detail Level, e.g. access to individual quantities
Earth Sciences Virtual Database
A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration 32
Ontology based Data Integration
Using scientific workflows A.K.Sinha, Virginia Tech, 2006
Three level
‘
metadata
’
solution?
Information Information Discovery Integration Level 1:
Registration at the Discovery Level, e.g. Find the upper level entry point to a source
Level 2:
Registration at the Inventory Level, e.g. list of datasets, using the logical organization
Level 3:
Registration at the Item Detail Level, i.e. annotation e.g. tagging
Catalog/ Index
Schema based integration 33
Integration
using mapping management A.K.Sinha, Virginia Tech, 2006
Information discovery
• What makes discovery work?
– Metadata – Logical organization – Attention to the fact that someone would want to discover it – It turns out that file types are a key enabler or inhibitor to discovery – Result ranking using *tuned* algorithm • What does not work?
– Result ranking algorithms that depend on unconventional information types (icon, index, symbol) 34
Federated search
• “is the simultaneous search of multiple online databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.
” wikipedia • Libraries have been doing this for a long time (Z39.50, ISO23950) • Key is consistent search metadata fields (keywords) • E.g. Geospatial One Stop http://www.geodata.gov
35
Smart search
• Semantically aware search, e.g. http://noesis.itsc.uah.edu
, http://eie.cos.gmu.edu
(Water -> Semantic Search) • Faceted search, e.g. mspace ( http://mspace.fm
), exhibit (MIT), S2S (RPI; http://aquarius.tw.rpi.edu/s2s ) 36
NOESIS
37
Faceted search
logd.tw.rpi.edu
38
Summary - discovery
• Useful to write a few discovery use cases to drive how your design is developed • Evolution of your role in facilitating discovery and what/ how others implement access to your information 39
Reading for this week
• Is retrospective 40
Check in for Project Assignment
• Analysis of existing information system content and architecture, critique, redesign and prototype redeployment • Or a new use case, development, etc.
41
What is next
• April 16 – Information Audit • April 23 – • April 30 – • May 6 – final project presentations 42