Information Audit and dealing with Unstructured Information Peter Fox Xinformatics 4400/6400 Week 11, April 16, 2013

Download Report

Transcript Information Audit and dealing with Unstructured Information Peter Fox Xinformatics 4400/6400 Week 11, April 16, 2013

Information Audit and dealing
with Unstructured Information
Peter Fox
Xinformatics 4400/6400
Week 11, April 16, 2013
1
Reading
• Information Discovery
–
–
–
–
–
Information discovery graph(IDG)
Projects using information discovery
Information discovery and Library Sciences
Information Discovery and retrieval tools
Social Search
• Metadata
– http://en.wikipedia.org/wiki/Metadata
– http://www.niso.org/publications/press/UnderstandingMetadata.pdf
– http://dublincore.org/
2
Contents
• Information Audit
• Unstructured Information
3
Businessdictionary.com
• Analysis and evaluation of a
firm's information system
(whether manual or
computerized) to detect and
rectify blockages,
duplication, and leakage of
information.
4
Objective?
• The objectives of this audit
are to improve accuracy,
relevance, security, and
timeliness of the recorded
information.
5
What is an information audit?
• An information audit is a process that
effectively determines the current
information environment within an
organization by identifying and mapping:
– What information is currently available?
– Where the information lives?
6
Results/ format (e.g.)
• The results of an information audit are
twofold: there is a detailed report which
includes:
– What information do staff acquire? Where
from? At what cost? How is it used?
– What information do staff create? What
happens to it? Where does it go?
7
Results/ format (e.g.)
– What information is stored and why? What
purpose will it serve?
– What information is passed on or
delivered? To whom? For what purpose? In
what form?
8
Results/ format (e.g.)
– Is there a gap, or a match,
between that which is available
and that which is needed?
– What are the skills and
responsibilities of the people
who carry out these tasks?
– What equipment and tools do
they have available (hardware,
software, filing cabinets, web
sites, etc)?
9
Results/ format (e.g.)
– Are there any control documents, such as policy
statements, guidelines, service level agreements,
procedures, manuals?
– Is any of the information (produced, acquired, processed,
re-delivered, or stored) superfluous to needs?
– Are any of the information-handling activities nonproductive?
10
Results/ format (e.g.)
• There is also a detailed flow chart:
– A visual map that show the areas, processes,
functions and activities through which information
passes, clarifying gaps or fault-lines that need to
be plugged or bottlenecks and overflows that
need to be unblocked
• Sound familiar?
11
How to use?
• An information audit can be used as a
baseline for making major improvements to
the business process of an organization.
• It is extremely helpful in the identifying,
buying, and implementation of enterprise
systems
– finance systems, portfolio management systems,
document management systems, learning and
knowledge management systems, etc.
12
Remember the use case doc?
Data
Type
Resource
(dataset
name)
Characteristics Description
Remote, e.g. – no cloud Short description of the
cover
dataset, possibly including
In situ,
rationale of the usage
Etc.
characteristics
Model
Owner
Description
Consumes
(model
name)
Organization Short
List of data consumed
that offers
description of
the model
the model
Developed for NASA TIWG
Owner
Source System
USGS, ESA,
etc.
Name of the
participating
system which
supports
discovery and
access
Frequency
Source System
How often the
model runs
Name of the
participating
system which
offers access to
the model
Event/application
Event
Owner
(Event
name)
Organization Short description of the event
that offers
the event
Application/ Owner
Description
Description
DSS
(Application Organization Short description of the application
name)
that offers
the
Application
Developed for NASA TIWG
Relevant
subscription
Source System
List of
subscriptions
(and owners)
Name of the
participating
system which
offers this
event
Source
System
Name of the
participating
system
which offers
this event
Remember
• It never hurts to know what you have
• Build it into the routine and do not leave it as
an after-thought (yep, just like documenting
your code!)
15
16
Sources and uses of
unstructured information
- audio, video,
graphics, social media
messages, etc. – that
which fall outside the
purview of traditional
databases
17
Data<->Information<->Knowledge
• Where is the structure?
Experience
Data
Creation
Gathering
Information
Presentation
Organization
Knowledge
Integration
Conversation
Context
18
Informatics
• Oh, wait – people structure information!
• Cognitive processes
– Semiotics
– Mental representation
– Intuition
– Expertise
• But not in the same way computers can!
19
20
So what happens?
• If a structured representation of
fundamentally unstructured information is
useless?
– Why would it be?
• What role does visual representation play in
structuring information? Hint:
21
More than 10 years ago…
• Unstructured Information Management Architecture
(UIMA) from IBM
– “Unstructured information management (UIM) applications are software
systems that analyze unstructured information (text, audio, video,
images, and so on) to discover, organize, and deliver relevant
knowledge to the user. In analyzing unstructured information, UIM
applications make use of a variety of analysis technologies, including
statistical and rule-based Natural Language Processing (NLP),
Information Retrieval (IR), machine learning, and ontologies.
– IBM's Unstructured Information Management Architecture (UIMA) is an
architectural and software framework that supports creation, discovery,
composition, and deployment of a broad range of analysis capabilities
and the linking of them to structured information services, such as
databases or search engines.
– The UIMA framework provides a run-time environment in which
developers can plug in and run their UIMA component implementations,
along with other independently-developed components, and with which22
they can build and deploy UIM applications.”
From way back…
23
24
Data<->Information<->Knowledge
• Future?
Experience
Data
Creation
Gathering
Information
Presentation
Organization
Knowledge
Integration
Conversation
Context
25
Reading for this week
• http://en.wikipedia.org/wiki/Information_audit
• http://www.librijournal.org/pdf/2003-1pp2338.pdf
• UIMA http://www.ibm.com/developerworks/data/do
wnloads/uima/
• SPAR http://tw.rpi.edu/web/inside/ideas/SPAREvalu
ation
26
What is next
•Today – project group meetings/ check in
•April 23 – TBD
•April 30 – written part of group project due
•May 7 – final project presentations (BE ON
TIME, i.e. 5-10mins BEFORE 9AM)
– Be prepared to be asked (and answer) questions
27