An Expertise Finder Application Built on Enterprise Search

Download Report

Transcript An Expertise Finder Application Built on Enterprise Search

An Expertise Finder
Application Built on
Enterprise Search
Robert Joachim, [email protected]
MITRE Corporation, McLean VA
Gilbane San Francisco 2008
June 19, 2008
MITRE Corporation
Fortune Magazine “100 best companies
to work for” (2002-2008)
Computerworld “100 best places to
work in IT” (2005-2008)
Approved for Public Release
A National Resource Working in the Public Interest
© 2008 The MITRE Corporation. All rights reserved.
Overview

About MITRE, our Intranet, our enterprise search
architecture

MITRE Expertise Finder implementation
– Expertise finding models (APQC)
– MITRE expertise finding history
– This product

Interface details

How its built / How it works
– System validation / Usage metrics
– Nearer and longer term enhancements
– Conclusion / Recommendations

Background
– Sources / Resources

Other recent ‘real world’ expertise finder implementations
– Commercial (COTS) software for expertise finding
– Community finding prototype example
– Use of social bookmarks at MITRE
Page 2
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
About MITRE

About MITRE
– Not-for-profit; operates 3 Federally Funded Research and
Development Centers (FFRDCs), for DoD, FAA, and IRS
– Application of expertise in systems engineering, information
technology, operational concepts, and enterprise
modernization
– 6000 employees located at Bedford, MA, McLean VA, plus other
domestic and international sites; 65% of staff have Masters or
Ph.D. degrees

Our role
– Problem solving / rapid response for our sponsors
– ‘Reachback’ into the corporation for knowledge is key

Long standing history of information sharing practices
– Embedded and reinforced in our corporate culture
Page 3
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
MITRE Intranet & Enterprise Search

MITRE Intranet is called the ‘MII’ – MITRE Information
Intranet
– Early adoption of web technology & web search
– Our intranet consists of multiple content repositories on
various platforms, including


Oracle Portal & Oracle application servers

Intranet content server & multiple distributed content servers

Microsoft SharePoint for team site management and collaboration

Listserv lists for collaborative communication
Google Enterprise is MITRE’s intranet search engine
– The expertise finder application described here is based on
Google enterprise search
Page 4
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
MITRE Google architecture
Content repositories
Application interfaces
GSA 5005
MITRE intranet search &
‘focused search’ interfaces’
Intranet
Expertise Finder
Email List Search
(2.2 M URLs total)
MITRE Intranet
server &
SharePoint
document
libraries
- URLs: 400K
Web-enabled file
system +
distributed MITRE
Webservers (40)
- URLs 1.4 M
Social Bookmarks
Technical Exchange
Meeting Search
MITRE List
messages
- URLs: 450K
Database crawls
XML feeds
Page 5
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
MITRE Expertise Finding History

Current system is based, in part, on earlier MITRE research
and prototype work
– MITRE staff: Maybury, House, D’Amore

First developments of this system, based on Google
enterprise search results, were also prototypes
– Then, released as a pilot project to collect user feedback
– Subsequently productized
– Then enhanced over multiple releases


Additional functional search and display features

Additional content resources for expertise identification
Architectural focus – created using
– Service-oriented componentized architecture

Loosely-coupled building block pieces, that can be swapped in/out,
if necessary (“What if we were to replace -- our enterprise search
system , our staff directory system”, etc.)
– Extensible

Can be extended to other content repositories or could be used for
alternate ‘finding’ applications
Page 6
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Expertise finding systems -- characterization
APQC (formerly American Productivity Quality Center)
characterization of Expertise Finding systems:

APQC ‘Model 1’ -- Linking knowledge seekers with
knowledge providers
– No a priori designation of ‘experts’
– This is our approach

APQC ‘Model 2’ – assigned discipline managers are
responsible for knowing levels of expertise in their area

APQC ‘Model 3’ – designated ‘validated’ experts
Page 7
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
MITRE Expertise Finder
What it looks like
Main view
Helps answer the question:
“Who at MITRE knows about
topic X“
 Results are based on Google
relevancy ranking, in
conjunction with author/owner
attribution & document counts
Organizational view
Page 8
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Expertise Finder – Results details
Main view
Source options
Email contact options
Display options
Content ‘evidence’ (with
title, links to object &
repository, ‘keywords in
context’, object date)
Person, with job title and link to phonebook
Page 9
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Expertise Finder – Results details
Organizational view and content display by organization
Bubble size and position
indicates contributors
and contributions by
corporate center or
division
Clicking on a single
bubble displays people
and content from that
organization
Page 10
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
How it works
http query
MII Google query / results
1
User
Query
MII Google
(Google Search
Appliance)
XML
Key MITRE Web-based repositories
contributing to Expertise Finder
Expertise Finder
(Java-based application on Oracle
Application Server)
2
1. Based on keyword query, retrieves
ranked results set from MII Google
search (XML output)
2. Identifies author / owner attributes
Staff attribution
3. Performs LDAP lookup for full staff
name and organization
4. Returns results set, ranked by
contributions, with hits ‘evidence’ and
keyword context
4 Expertise Finder results
(staff and organizations)
3
Resource
Author / owner
attribution
Web-enabled file system
(Employeeshare transfer
folders,
‘about-me’ resumes)
Standard User
Identifier (SUI)
in folder path
CommunityShare
(MS SharePoint)
MS Office Property Author
Field
MITRE blogs
SUI (email name)
MITRE Sourceforge
HTML Author
meta tag
MITRE List Messages
SUI (email name)
MITRE Technical Exchange
Meetings
Meeting Point of contact
MITRE Institute Courses
Course instructor (future)
Onomi social bookmarks
Bookmark contributor (future)
LDAP
(staff / organization
lookup)
Page 11
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Resource and staff attribution details
From
step 2,
2
previous
slide
Resource
Author / owner
attribution
Person
Metadata
Quality
Web-enabled file system
(Employeeshare transfer folders,
‘about-me’ resumes)
Standard User Identifier
(SUI = email name) in folder
path
Excellent
CommunityShare (MS SharePoint)
MS Office Property Author
Field
Varies
MITRE blogs
Standard user identifier
(email name)
Excellent
MITRE Sourceforge (Software
projects)
HTML Author meta tag
Excellent
MITRE List Messages
Standard user identifier
(email name)
Excellent
MITRE Technical Exchange Meetings
Meeting Point of contact
Excellent
MITRE Institute Courses
(future resource)
Social bookmarks (Onomi)
(future resource)
HTML pages from distributed
webservers
Course instructor
Excellent
Bookmark contributor
Excellent
HTML Author meta tag
Varies
MS Office documents from distributed
webservers
MS Office Property Author
Field
Varies
Page 12
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
This system architecture: Advantages / Disadvantages

Advantages
– Uses full-text indexed content (concepts are ‘fluid’ – especially
for new technologies, products, projects)
– Uses the same online content contributors share in the course of
their day-to-day work
– No requirement for users to maintain a registry of expertise
– Incentivizes staff to share content online in open repositories
– Incentivizes staff to use correct metadata, especially authorship
metadata

Disadvantages
– Results are only as good as the quality of authorship metadata
used to associate information objects with staff
– Results are dependent on the underlying search system
relevancy ranking

Although, we also force specific repository results by sending
multiple parallel queries to multiple content repositories
– Users could ‘game’ the system (by arbitrarily putting large
numbers of documents online)
Page 13
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Query characteristics -- observations

Best performing queries
– Specific (‘term specificity’): Products, programs, projects,
standards – query terms that are ‘good discriminators’
– Query examples:


Standards/Compliance: IEEE-1061, fisma

Products: Cognos, AppWorx

Projects: Next-generation airspace

Topics: ontologies, second life, biometrics
Worst performing queries
– Extremely general terms, whether single or multiple words
– Query examples:


‘ Engineering’ – in a corporation where a majority of staff function
in some engineering capacity and are performing engineeringrelated tasks
‘Software’ – in a corporation of where a significant portion of our
work focuses on some aspects of software engineering
– But – consider – these very general queries may not perform
well in general full-text retrieval anyway
Page 14
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Results evaluation/validation methods

Validation methods
– Informal: Send in a query based on a topic where the user knows
a set of experts/specialists, see how many come back in results

Many staff take this on themselves, as a check to see if they are
included in results
– Informal: when a user submits an email to contacts identified,
informal email probe to that user

“Was this system helpful in identifying knowledgeable staff”

“Did you get an answer”
– Metrics: Continued usage by staff

Query metrics have held steady over time, paralleling general search
query metrics
Page 15
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Usage metrics and query analysis

Usage metrics tracking by
– Basic usage

From Google query logs -- Expertise Finder interface queries are
coded with a specific parameter for identification in query logs

From internal WebTrends web analytic reports (user visits, page
views)
– Query analysis


We can identify specific queries sent to the system by analyzing
query log data
Monthly usage metrics
– General MITRE Google queries: 60K - 75K queries per month
– Expertise Finder queries: 2K - 4K queries per month
– Usage ratio average of general search to the expertise finder
application -- ~25 : 1
Page 16
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Expertise Finder enhancements

New enhancement in development: ‘Presence awareness’
identification using a Microsoft Office ActiveX Web service
Shows staff availability
online, free/busy status,
access to Office
Communicator chat,
other tools

System enhancements under consideration
– Limit by date/date range (to find most staff based on most
recent contributions)
– Limit by more detailed level of contributing repositories

Beyond just ‘Documents and Webpages’ and ‘Lists’

E.g., SharePoint, Technical Exchange Meetings, Social Bookmarks
– Permit user to limit and/or sort by staff classification/role, e.g.,

‘AC’ Technical; PRO ‘professional level support’; PSS
Administrative support
Page 17
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Longer term: where we may be going

Exploration of social networking for expertise finding
– Use of staff profiles based on social networking models (similar
to MySpace, Facebook, LinkedIn)
– Use social network connections as an additional dimension in
expertise finding

Hybrid approach – base expertise finding on
– Text from document content

As we are doing – there will continue to be value in identifying
expertise from content objects
and
– Staff profiles, which may be

User-generated, auto-generated, or a combination
– Let the user decide, per query, how to focus the results based
on these resources
Page 18
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Conclusion/recommendations: expertise
finding implementation
If you are considering expertise finding implementation

Consider which APQC model fits your organization’s
environment and requirements

If implementing a software application
– Evaluate metadata quality for staff attribution
– Decisions

Staff identification based on registry vs. content, or hybrid

Build vs. buy

Build in conjunction with enterprise search
– Use of service-oriented architecture

For swap-in/swap out of code base, directory resources, and
content resources

For future feature enhancements
Page 19
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Background info
Page 20
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Sources/Resources

Google Scholar results – MITRE expertise finding research
– MITRE authors: M. Maybury, D. House, R. D’Amore
http://scholar.google.com/scholar?q=Maybury,+House,+D%E2
%80%99Amore

Ackerman, Mark and McDonald, David “Just Talk to Me: A
Field Study of Expertise Location” Proceedings of the 1998
ACM conference on computer supported cooperative work,
Seattle, November 14-18, 1998
– http://portal.acm.org/citation.cfm?id=289506

Hughes, Gareth and Crowder, Richard “Experiences in
designing highly adaptable expertise finder systems”
Proceedings of the DETC 03, Chicago, September 2-6, 2003
– http://eprints.ecs.soton.ac.uk/8206/

Expertise Locator Systems: Finding the Answers. APQC
Publications: 2003
– http://www.apqc.org/portal/apqc/ksn?paf_gear_id=contentgear
home&paf_dm=full&pageselect=detail&docid=123338
Page 21
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Sources/Resources

Maybury, Mark Expert Finding Systems, MITRE
Corporation: MITRE Technical Report, MTR 06B00040,
September 2006
– http://www.mitre.org/work/tech_papers/tech_papers_06/06_111
5/06_1115.pdf

Maybury, Mark “Discovering Distributed Expertise” AAAI
Fall Symposium Series, Regarding the “Intelligence” in
Distributed Intelligent Systems, November 9, 2007
– http://www.mitre.org/work/tech_papers/tech_papers_07/07_073
0/07_0730.pdf

Damianos, Laurie, et al. Onomi: Social Bookmarking on a
Corporate Intranet, MITRE Corporation, May 2006
– http://www.mitre.org/work/tech_papers/tech_papers_06/06_035
2/06_0352.pdf

Author contact: [email protected]
Page 22
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Sources/Resources: Expertise finding recent
real-world implementations

Presented or cited at Enterprise Search Summit, New York,
May 20-21, 2008
– “Mining additional value from enterprise search”, Trent
Parkhill, Haley & Aldrich

Based on search product: Coveo
– “Search connections in context”, Oz Benamram, Morrison &
Foerster

Based on search product: Recommind
– Google, Inc. internal expertise finder


Based on search product: Google enterprise
Montague Institute, January 17, 2008
– “Enterprise mashups for expertise location”, Qin Zhu, HP Labs

Based on search product: Inktomi
Page 23
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Commercial software products for
enterprise expertise finding -- examples

Enterprise search with expertise finding components
–
–
–
–
–

Autonomy IDOL
FAST (Partnering with AskMe)
Endeca
Recommind
Microsoft SharePoint Enterprise search (with MOSS07
“Knowledge Network”)
Dedicated/specialized systems
– TACIT ActiveNet
– AskMe
– Triviumsoft SEE-K
Page 24
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
Community Finding (Prototype)
Based on MITRE
Expertise Finder code
 Identifies MITRE online
communities (Email lists
& SharePoint
communities)
Uses search results of

– List messages
– Documents associated
with a community
– Community
descriptions
Page 25
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.
“Onomi” Social Bookmarks –Tagging/Sharing
Onomi (rhymes with
of Web Resources by MITRE staff
‘Taxonomy’) – based on
open-source tool Scuttle
– Lets MITRE staff
bookmark content
resources and tag content
of interest with topical
terms
–Helps me “find this again”
–Builds communities of
interest
– And – contributes to
expertise finding:
Users tagging/contributing
are ‘experts’ in the
content/topics bookmarked
and tagged
Page 26
Approved for Public Release
17-Jul-15
© 2008 The MITRE Corporation. All rights reserved.