Transcript iWork: Analytics in Human Resource management
Invited Talk at the Forum for Information Retrieval Evaluation (FIRE 2012), Indian Statistical Institute, Kolkata, India on 19-Dec-2012
iWork: Analytics for Human Resources Management
Girish Keshav Palshikar
Tata Consultancy Services Limited 54B Hadapsar Industrial Estate, Pune 411013, India.
1
Human Resources Management
HR Management is a
crucial function
within any organization – – More so in services, IT and BPO industries HR is a
cost center
;
everyone
interacts with HR!
HR function is characterized by a large number of – – – – – – – Business processes IT systems that automate these business processes Huge databases of employee activities Many employee contact initiatives Close interactions with L&D Associated metrics and KPI Monitoring and regulatory compliance 2
Phases in HR Management
Talent Acquisition
– – Requirement gathering, recruitment planning Campus recruitment, EP interviews and recruitments
Talent Management and Utilization
– – – – – – – Allocation (e.g., to project), team formations, monitoring Roles and tasks, Utilization tracking (billing, timesheets) Transfer, deputation, Travel Training, Knowledge management Performance appraisal, Promotion, Salary Communications Administration (leave, medicals, …)
Talent Retention
– – Feedback, complaints, grievances, … Resignation handling, retention, knowledge transfer, succession 3
Business Goals of HR Management
– – – – – – – HR is responsible for maintaining a
high-quality workforce
– –
Well-aligned
and competitive for the business of the organization
Effective
in performing the business tasks and services, delivering required
value
and meeting client expectations; – – –
Well-trained
in the required and emerging skills Highly
responsive
to emerging business requirements
Stable
(low attrition, low impact of attrition; successful succession and knowledge transfer)
Cost-effective
(low salary and overhead costs) Able to evolve into
leadership Agile
and
Mobile
(quickly form effective and distributed teams)
Motivated
, high on initiative and ownership, highly proactive
Happy
with their environment, work, roles, salaries, career paths Follows professional
ethics
, codes of conducts etc.
Well-integrated
and
diverse
; highly communicative 4
What Makes HR Management Challenging?
The human factors!
Large and varied backgrounds of the workforce Globalization, diversified and distributed workforce New business demands (services, products, …) New business models New customers across the globe Mergers and acquisitions across the globe Changing skills requirements Innovation (disruptive / incremental, technical / domain) Risks (ethical violations, data privacy, client confidentiality) 5
iWork: Analytics for HR
Data Repositories Text Repositories Analytics Techniques Insights Patterns Knowledge
Actionable Novel
HR Domain Knowledge
Vision • Make effective use of historical databases and document repositories for solving HR domain-specific problems • Use analytics-driven decision making to meet HR business goals • Combine data and text mining to build innovative HR domain-specific solutions for significant enterprise • Deliver analytics-derived to the right users at the right point in the HR business processes Technology Areas Data Mining, Machine Learning, Pattern Recognition, Statistical Analysis, Natural Language processing, Computational Linguistics, Text Mining; Optimization 6
iWork: Opportunities
Reduce effects of attrition
: understanding of root causes, accurate prediction, targeted retention strategies, backup/replacement plans, …
Improve project team formation
: optimal mix of experience and expertise for all projects, optimal cost team, maximize match with associates’ interests
Improve associate satisfaction
: better understanding of drivers, identify concrete action items, cost-effective improvement plan, what-if analysis
Talent acquisition
: reduce delays and costs, maximize match with requirements, school evaluation, identify patterns of long term stay, …
Team profiles:
in terms of backgrounds, skills, roles, domains, …
Effective RFI Responses/proposals :
locate experts, experience, tools
Available HR Datasets: r
esumes (internal, external), timesheets, project details, in house tool/document repositories, allocations, trainings, surveys, … 7
iWovierk Overview
QUEST: employee/customer survey analytics iRetain: Attrition analytics; retention analytics Resume Center: extract structured information from resumes; match job requirements; find experts; team skill profiles; … ExBOS: optimal project team formation iTAG: analytics for talent acquisition Analytics for improving effectiveness of training programs … 8
iWork Vision: Mine HR data to Drive Improvements to Workforce Management
Complaints Management System PULSE PEEP mPOWER ExOp SPEED
Talent Retention
Talent Utilization and Management
RMG Tracker iCALMS Visa Tracker
Talent Acquisition
HR Datasets
9
iWork: Strategy and Offerings
Build a set of integrated offerings that identify specific improvement opportunities for Workforce Management Transcend silos in HR systems and data Integrate the offerings with existing HR systems to deliver the actionable recommendations to the users when they need
Talent Acquisition
•
TA analytics
: improve cost, • quality, timeliness of TA
ILP analytics Talent Retention
• As-is state dashboard for attrition • Discovering high-attrition groups • Predictive models for attrition • Identifying root-causes of attrition • Plan for reducing attrition impact • Optimal retention plan
Talent management
• • •
Ques
t: employee survey analytics
EXBOS
: optimal team formation
Visa analytics
•
Resume Center
: competency extraction; find right people for positions; find experts; find misinformation; enrich RFP response (past projects, tools); talent pool • profiling; find customer intelligence;
Bench analytics ITIS workforce management
: team sizing + shift planning; optimal team skill profile; service level rationalization; expert finding; training plans; DC transformation planning Automation of survey responses tagging
Nielsen
10
Practical Applications Need a Lot More than IR
Business Solution
Various databases and text repositories
Document retrievals; ranking Fine-grained retrieval Goal-directed retrieval Post-processing Information extraction Cross-linking and information fusion (e.g., with FB, LinkedIn) Classification Visualization; Summarization Analytics (problem-specific) Learning to rank 11
Effective use of resumes in all HR functions RESUME CENTER 12
Using Resumes in HR Functions
Resumes: a valuable source of information for people’s work – – ~250,000 TCS employees’ resumes ~2 million candidates’ (applicants’) resumes
Business goals
– Use information extraction: extract personal, job history, project details, education, training, awards etc. from given resumes o Validate information in resumes o Create gazettes (colleges, degrees, certifications, tools, companies) – – Update employee experience profile and skills/competencies Identify top o
K
best matches for a given job requirement
/
position Learning to rank (poster in FIRE 2012) o improve team formation; shorten recruitment cycle – Perform mining of data extracted from resumes to derive novel, actionable insights about the available talent pool o reduce bench; reduce attrition; improve utilization o identify training opportunities; help in career planning 13
Resume Center: Information Extraction
On-the-fly extraction using an IR engine
14
Resume Center: Information Extraction
15
Resume Center: PowerMiner
Given a set of resumes, provide facilities to help in filing RFI/RFP responses, form project teams etc.
locate relevant projects for a given project description locate relevant tools for a given project description identify expert persons for a given technical area assign domain(s) to each resume (e.g., insurance, railways, banking, telecom etc.) Identify "unusually high quality" resumes in terms of a set of pre-defined quality criteria – Special tools, niche skills, extra qualifications (e.g., domain related), top-quality academic performance, awards, publications 16
Resume Center: Team Profiler
Given a resume repository, help HR executives in building an “understanding” of their teams: – What are the strengths and weaknesses of my team in terms of technical skills, domain knowledge, roles etc.? – What should I do to improve the quality of my teams?
Create a summary profile of a team, in terms of technology skills, domains, experience etc.; Group the given resumes into clusters (from different perspectives), with specific interpretation for each cluster – Similar to customer segmentation? Document repository visualization and exploratory facilities 17
R. Srivastava, G. K. Palshikar,
RINX: Information Extraction, Search and Insights from Resumes
, Proc. TCS Technical Architects' Conf., (TACTiCS 2011) , Thiruvanthapuram, India, Apr. 2011.
S. Pawar, R. Srivastava, G.K. Palshikar,
Automatic Gazette Creation for Named Entity Recognition and Application to Resume Processing
, Proc.
ACM COMPUTE 2012 Conference
, Pune, India, 24-Jan-2012. G.K. Palshikar, R. Srivastava, S. Pawar,
Delivering Value from Resume Repositories
,
TCS White Paper
published on www.tcs.com
, Feb. 2012. (c) Tata Consultancy Services Limited.
18
Survey response analytics QUEST 19
QUEST: Overview
Advanced analytics tool to mine survey response data and derive novel, actionable insights for improving workforce management Surveys are a direct and effective mechanism to gauge concerns and issues that affect satisfaction of employees or customers
Motivation
: TCS conducts an annual in-house employee survey – – – 250,000 employees, ~100 questions (structured, free-form) 250,000 textual responses to each of ~20 questions
Challenges
: volumes; dependencies; mixed structured/text responses
Business goals
: improve satisfaction levels among employees
Benefits:
deeper insights, objective results, reduced time/efforts
Impact:
Satisfaction levels affect projects quality, client satisfaction
Status
: Currently deployed in-house
Vision
– QUEST should be an integral part of all HR contact and feedback programs throughout TCS (ISU, geographies, clients etc.) – Deploy for customer / product satisfaction surveys 20
QUEST: Approach
Dashboards and standard reports Drill-down exploratory analysis Visualization Summarize responses to specific questions/categories Identify specific issues, concerns and suggestions Characterize low-satisfaction groups (discover common characteristics of employees with high/low satisfaction) Identify factors (root causes) that affect satisfaction Design optimal plans to improve satisfaction levels Use survey results in team planning and other workforce management tasks G.K. Palshikar, S. Deshpande, S. Bhat,
QUEST: Discovering Insights from Survey Responses
, Proc.
8th Australasian Data Mining Conf. (AusDM09)
, Dec. 1-4, 2009, Melbourne, Australia, P.J. Kennedy, K.-L. Ong, P. Christen (Ed.s), CRPIT, vol. 101, published by Australian Computer Society, pp. 83 - 92, 2009.
21
QUEST: Results
22
QUEST: Results…
Things you don’t like about TCS
23
Quest: Results…
PULSE 2008-09 Responses for TCS Mumbai
Groups having unusually low ASI EXPERIENCE_RANGE = ‘4-7’ (60.4; global avg. = 73.8) Root causes for low ASI Canteen, Transportation, RMG
Interesting subset discovery
: finding bumps in a large-dimensional distribution M. Natu, G.K. Palshikar,
Interesting Subset Discovery and its Application on Service Processes
, Proc.
Workshop on Data Mining for Services (DMS 2010)
held as part of the Int. Conference on Data Mining (ICDM 2010), Australia, 2010, pp. 1061-1068. 24
QUEST: Results…
Actionable suggestions made by associates
TCS can have tie ups with best Schools in the near by locations for their employee kids … the moment you step out there is only garbage and randomly parked autos around TCS can engage with lease agreement … with TATA Housing itself and provide economical accommodation.
I don`t have any leg space...n my knees are hurting badly S. Deshpande, G.K. Palshilkar, G Athiappan,
An Unsupervised Approach to Sentence Classification
, Proc.
Int. Conf. on Management of Data (COMAD 2010)
, Nagpur, 2010, Allied PublishersPvt. Ltd., pp. 88 - 99.
25
Sentence Classification
Sentence class labels are usually domain-dependent Unsupervised classification of sentences: specific / general 26
Sentence Classification…
A SPECIFIC sentence is more ”on the ground” A GENERAL sentence is more ”in the air” Example: –
My table is cramped and hurts my knees.
–
The work environment needs improvement.
– –
Travel vouchers should be cleared within 2 working days. Accounts department is very inefficient.
27
Sentence Classification…
Compute a
specificity score
for each sentence: – Unsupervised (knowledge-based), without the need for any labeled training examples. – Define a set of features and compute their values for each sentence. – – The features are lexical / semantic. The features are context-free: their values are computed exclusively using the words in the sentence and do not depend on any other (e.g., previous) sentences. – Then combine the feature values for a particular sentence into its specificity score.
Rank
the sentences in terms of their specificity score.
28
Sentence Classification…
Sentence features
–
Average semantic depth (ASD)
–
Average semantic height (ASH)
–
Total occurrence count (TOC)
–
Count of Named Entities (CNE)
–
Count of Proper Nouns (CPN)
–
Sentence Length (LEN)
29
Sentence Classification…
Semantic depth
(
SD
) SD
T
(
w
) of a word
w
is the distance (number of edges) from the root of ontology
T
to word
w
in
T
– – We use T = WordNet ISA ontology More semantic depth more specific word 30
Sentence Classification…
Semantic depth of a word changes with its POS tag and with its sense; – – SD(bank) = 7 for financial institution SD(bank) = 10 for flight maneuver sense. Solution: – Apply word sense disambiguation (WSD) during pre-processing; or – Take average of the semantic depths of the word for top
k
of its senses 31
Average semantic depth
S
.
ASD
for a sentence
S
= <
w
1
w
2 . . .
w n
> containing
n
content-carrying words = the average of the semantic depths of the individual words
My table hurts the knees.
– (8 + 2 + 6)/3 = 5.3
The work environment needs improvement.
– (6 + 6 + 1 + 7)/4 = 5.
32
Semantic height
(
SH
) SH
T
(
w
) of a word
w
the longest path in
T
from word
w
is the length of to a leaf node – – We use
T
= WordNet hyponym ontology Lower semantic height more specific word 33
Average semantic height
S
.
ASH
for a sentence
S
= <
w
1
w
2 . . .
w n
> containing
n
content-carrying words (non stop-words) = the average of the semantic heights of the individual words Semantic height of a word changes with its POS tag and with its sense; Solution: use WSD or take average of the semantic heights of the word for top
k
senses of its 34
Intuition: more specific sentences tend to include words which occur
rarely
in some reference corpus – apple (2), fruit (14), food (34) More the number of rare words in a sentence, more specific it is likely to be.
OC
(
w
) = occurrence count of word
w
in WordNet; – if
w
has multiple senses, then
OC
(
w
) = average of the occurrence counts for top
k
senses of
w
Total occurrence count
...
w n
> containing
n S
.
TOC
for a sentence
S
= <
w
1
w
2 content words is the sum of the lowest
m
occurrence counts of the individual words, where
m
is a fixed value (e.g.,
m
= 3).
OC of a word changes with its POS tag and with its sense; Solution: use WSD or take average of the OC of the word for top
k
of its senses 35
Named entities (NE)
are commonly occurring groups of words which indicate specific semantic content – – – – Person name (e.g., Bill Gates) Organization name (e.g., Microsoft Inc.), location (e.g., New York), date, time, amount, email addresses etc. Since each NE refers to a particular object, an NE is a good indicator that the sentence contains specific information. Another feature
S
.
CNE
for a sentence
S
is the
count of NE
occurring in
S
36
Proper Nouns (PN)
are commonly occurring groups of words which indicate specific semantic content – Abbreviation (
IBM
or
kg
), domain terms (
oxidoreductases
), words like (
Apple iPhone
), numbers etc.
Since each PN may refer to a particular object, an PN is a good indicator that the sentence contains specific information. Another feature
S
.
CPN
for a sentence
S
is the
count of PN
occurring in
S
37
Sentence length
, denoted S.Len, is a weak indicator of its specificity in the sense that more specific sentences tend to be somewhat longer than more general sentences. Length refers to the number of content carrying words (not stopwords) in the sentence, including numbers, proper nouns, adjectives and adverbs 38
Features have contradictory
polarity
. – We want higher values more specificity.
– – Not true for features ASH and TOC Lower values higher specificity for these
Scales
of values for various features are not the same, because of which some features may unduly influence the overall combined score. – E.g., ASD is usually 10, whereas TOC is a larger integer.
Uniform scaling
: map
x
[
a
,
b
] to
y
[
c
,
d
] Scaling + reversal of polarity 39
40
41
Sentences 6, 7, 9 as top 3 in terms of specificity score 42
Some specific sentences identified by our algorithm from 110,000 responses in an
employee satisfaction survey
43
Some specific sentences identified by our algorithm from 220 sentences from 32 reviews of a hiking backpack product by Kelty. 44
Conclusions
Domain-driven IR
= IR + text-mining of retrieved documents Enterprise document repositories offer good scope for Domain-driven IR to deliver solutions and insights relevant for real-life business problems and decisions 45