Transcript Document

Where Do Developers Log?
An Empirical Study on Logging Practices in Industry
Qiang Fu1, Jieming Zhu2, Wenlu Hu3, Jian-Guang Lou1, Rui Ding1,
Qingwei Lin1, Dongmei Zhang1, Tao Xie4
1Microsoft
Research
Beijing, China
7/17/2015
2The
Chinese University of
Hong Kong, HK, China
3Carnegie
Mellon
University, PA, USA
ICSE 2014 SEIP
4University
of Illinois at
Urbana-Champaign, IL, USA
1
Outline
• Background
• Research questions
• Study methodology
• Results
• Improving logging practice
• Conclusion
7/17/2015
ICSE 2014 SEIP
2
Background
• Logs are crucial for system management
– Various application scenarios
– Only available data source for online services
• Logging quality has big impact
• Logging quality is determined by two factors
– What to log
– Where to log
7/17/2015
ICSE 2014 SEIP
3
Where to log
• Logging too little has negative effects
– Potentially missing necessary runtime information for
postmortem analysis
• Logging too much also has negative effects
– Introducing both runtime overhead and storage cost
– Masking the important information by useless logs
– Incurring cost of code development and maintenance
• How do developers make informed decisions on
where to log?
7/17/2015
ICSE 2014 SEIP
4
Characterizing logging location
• Logging statement (Li)
• Logged code snippet (Ci) – a block of source code
whose behavior Li intends to log
if (string.IsNullOrEmpty(tokenReference))
ULS.SendTraceTag(USLTraceLevel.Unexpected,
“Missing token reference value.”);
• Where to log -> characteristics of logged code
snippets
7/17/2015
ICSE 2014 SEIP
5
Research questions
• Q1 – what categories of code snippets are logged?
• Q2 – What factors are considered for logging?
• Q3 – Is it possible to automatically determine
where to log?
7/17/2015
ICSE 2014 SEIP
6
Study methodology
• Source code analysis
Software
systems
Description
LOC
# of logging
statements
% logging
statements
System-A*
Online service
2.5M
23.5K
0.94%
System-B*
Online service
10.4M
95.3K
0.92%
* Selected due to high popularity and long development history.
• Developer survey**
– 21 questions in 4 categories
– Single/Multiple-choice questions and open-ended questions
– 54 responses with average 5.3 years of MS working experiences
**Available at project website.
7/17/2015
ICSE 2014 SEIP
7
RQ1: Categories of logged snippets
• Randomly sampling 100 logged code snippets
• Conducting manual categorization
• Summarized categories
Category
Unexpected
situations
Execution
points
7/17/2015
Assertion-check logging
Return-value-check logging
Exception logging
Logic-branch logging
Observing-point logging
ICSE 2014 SEIP
Samples
19/100
14/100
27/100
16/100
24/100
8
RQ1: Sample code
/* Example 1: Assertion-check logging */
ULS.AssertTag(site != null, "site cannot be null");
/* Example 2: Return-value-check logging */
if (String.IsNullOrEmpty(tokenReference))
ULS.SendTraceTag(ULSTraceLevel.Unexpected, "Missing token reference value.");
/* Example 3: Exception logging */
try {
RemoveOfflineAddressBooks();
}
catch(AccountUnauthorizedException e) {
Logger.LogMessage("Removing failed with exception: {0}", e);
}
/* Example 4: Logic-branch logging */
if (instanceName.IsSqlExpressInstalled) {
Tracer.TraceLogInfo("Detect sql express instance. No need to install.");
}
else {
Tracer.TraceLogInfo("No sql express instance. Do fresh install.");
res = SqlCleanInstall();
}
/* Example 5: Observing-point logging */
Tracer.TraceLogInfo("Creating the tab order for form {0}", base.Name);
7/17/2015
ICSE 2014 SEIP
9
RQ1: Survey results
• Asking survey participants to
– Tick the most common categories
– List any additional categories
• Summarized categories
Category
Unexpected
situations
Execution
points
7/17/2015
Assertion-check logging
Return-value-check logging
Exception logging
Logic-branch logging
Observing-point logging
ICSE 2014 SEIP
Samples
#Votes
% of votes
19/100
14/100
27/100
16/100
24/100
27/54
34/54
43/54
36/54
44/54
50%
63%
80%
67%
81%
10
RQ1: Distribution of categories
• About half of the logged snippets are due to unexpected situations
• The other half are due to recording normal execution information
at critical execution points
Category
Assertion-check
Return-value-check
Exception
Subtotal:
Unexpected situations
Logic-branch
Observing-point
Subtotal:
Execution points
Total
7/17/2015
System-A
5,476 (23%)
2,716 (12%)
4,333 (18%)
System-B
20,186 (21%)
8,959 (9%)
8,399 (9%)
12,525 (53%)
37,544 (39%)
3,807 (16%)
7,170 (31%)
16,658 (18%)
41,138 (43%)
10,977 (47%)
57,796 (61%)
23,502
95,340
ICSE 2014 SEIP
11
RQ2: Factors for making logging decisions
• Focused code snippets (FCS) as subject
– Either catch blocks or return-value-check blocks
– Often observed in unexpected situations
– NOT every FCS is checked
• Studying both logged & unlogged FCS
– Overall logging statistics
– In-depth analysis on logged code snippets
– Manual investigation on why NOT logging
7/17/2015
ICSE 2014 SEIP
12
RQ2: Overall logging statistics
A small portion of focused code snippets are logged.
Catch block
Statistics
Exception types
Catch blocks
System-A
225
7,582
Logged catch blocks 3,222 (42%)
Return value check
7/17/2015
System-B
1657
21,656
6,410 (30%)
Function types
21,813
155,444
Function call-sites
131,390
723,691
Checked call-sites
34,464
104,167
Logged call-sites
2,716 (8%)
8,959 (9%)
ICSE 2014 SEIP
13
RQ2: Logged catch blocks
• Pivoting on exception types
– Different exception types having different semantic meanings
– Calculating logging ratio for each exception type
• Findings
– Many exception types have high correlations with developers’ logging
decisions in their catch blocks
75%
60%
45%
% of Catch Blocks
% of Exception Types
– Most catch blocks correspond to the exception types that are not highly
correlated with developers’ logging decisions
System-A
System-B
30%
15%
0%
0% 10%10%20% 20%30% 30%40%40%50% 50%60% 60%70%70%80%80%90%90%100%
100%
Logging Ratio
60%
45%
30%
15%
0%
The distribution of exception types over logging ratio
7/17/2015
System-A
System-B
ICSE 2014 SEIP
0%10% 10%
20% 20%
30% 30%
40% 40%50% 50%
60% 60%70% 70%80% 80%90% 90%
100%100%
Logging Ratio
The distribution of catch blocks over logging ratio
14
RQ2: Reasons why NOT logging
• Analysis process
– Identifying reasons from 70 randomly selected unlogged
catch blocks
– Asking survey participants to tick the most common reasons
• Summarized reasons
Samples
% of
samples
Logging decisions are made by
subsequent operations
29/70
41%
Exceptions are not critical
32/70
46%
Exceptions are recoverable
9/70
13%
Reasons of NOT logging
7/17/2015
ICSE 2014 SEIP
15
RQ2: Examples of NOT logging
/* Example 6: An exception used to determine logic branch*/
void AccountConfig(MONOAccount user, string propertyName) {
...
bool userHasRights = true;
try {
user.DeleteAccountProperty(propertyName);
}
catch (UnauthorizedAccessException) {
userHasRights = false;
}
if (userHasRights) {
...
}
}
/* Example 7: An exception re-thrown */
try {
Type t = Type.GetTypeFromID(guid);
object instance = Activator.CreateInstance(t);
}
catch (Exception e) {
throw new TestFailedException("Fail to create Com interface.\t: "
+ Tester.GetExceptionDetails(e));
}
/* Example 8: An exception recovered by retrying */
void DWAppOverride(...) {
...
Uri UriNew = null;
try {
UriNew = new Uri(wApp);
}
catch (UriObjectFormatException) {
// Assume http is the scheme and the URL param is the machine name
if (UriNew == null) {
try {
UriNew = new Uri("http://" + wApp);
}
}
}
}
7/17/2015
ICSE 2014 SEIP
16
RQ2: Survey results
• Asking survey participants to
– Tick the most common reasons of NOT logging
– Listing additional reasons
• Summarized reasons
Samples
% of
samples
#Votes
% of
votes
Logging decisions are made by
subsequent operations
29/70
41%
34/54
63%
Exceptions are not critical
32/70
46%
7/54
13%
Exceptions are recoverable
9/70
13%
17/54
31%
Reasons of NOT logging
7/17/2015
ICSE 2014 SEIP
17
RQ3: Automatic logging
Modeling automatic logging as a prediction problem
• Extracting features of a code snippet
• Labeling a code snippet according to whether it is logged
• Learning prediction model from historical code snippets
• Predicting whether to log for a new code snippet
7/17/2015
ICSE 2014 SEIP
18
Evaluation
• 10-fold cross-validation
• Experiments with different feature sets
– Using only exception type’s logging ratio as features
– Enriching features with contextual keywords
Logging decision
factors
Type
(Exception type
/Function type)
Type & Contextual
information
7/17/2015
System-A
Metrics
System-B
Catch block
Return-valuecheck
Catch block
Return-valuecheck
Precision
0.700
0.792
0.614
0.860
Recall
0.785
0.724
0.812
0.766
F-Score
0.740
0.757
0.699
0.810
Precision
0.902
0.870
0.811
0.882
Recall
0.899
0.899
0.808
0.904
F-Score
0.901
0.884
0.809
0.893
ICSE 2014 SEIP
19
Improving logging practices
• Automatic logging tool for developers
• On-demand logging in production
• End-to-end tracing
• Log filtering
• Log categorization
• Log analysis and visualization
7/17/2015
ICSE 2014 SEIP
20
Conclusion
• Focusing on studying where to log
• 6 findings on
– Categories of logged code snippets
– Factors considered for logging decisions
– Feasibility of automatic logging
• Potential directions for improving logging practices
7/17/2015
ICSE 2014 SEIP
21
Q&A
7/17/2015
ICSE 2014 SEIP
22
Backup
7/17/2015
ICSE 2014 SEIP
23
Other logging factors
7/17/2015
ICSE 2014 SEIP
24
RQ3: Feature extraction
• Important factors identified by the study
– Exception type
– Contextual information
• Contextual feature extraction
– Extracting related function & class names
• Functions contained in a code snippet
• Container function name
• Container class name
– Segmenting names into keywords
• E.g., MONOAccount.DeleteAccountProperty ->
• “mono”, “account”, “delete”, “property”, “account”
– Representing a code snippet as bag of words
7/17/2015
ICSE 2014 SEIP
25