Development of Exercises for Basic Surgical Skills Assessment

Download Report

Transcript Development of Exercises for Basic Surgical Skills Assessment

Development of Exercises for Basic Surgical Skills Assessment

Niyant Patel, James Robbins, Mario Villalba Jr., Daryl Reid, and Charles Shanley Department of Surgery William Beaumont Hospital, Royal Oak, Michigan

Changes in Operative Experience

 The 80 hour workweek  Resident Autonomy  Specialized Centers  Minimally Invasive Surgery

Uniformly Used Methods of Assessment

 Operative Logs  Faculty Evaluations  In-training Examination scores

Goals and Objectives

 To develop low fidelity exercises for basic, open surgical skills  To demonstrate construct validity  To establish interrater reliability  To show internal consistency of the test

Definitions

 Construct validity  Extent to which a test discriminates between various levels of expertise  Interrater reliability  Extent of agreement between two or more independent raters  Internal consistency  Correlation of parts of a test with each other

Model Development

 Low fidelity  Reproducible  Portable  Focused on components of basic skills

Model Development

 The five included in this study had face validity*  All exercises were limited by time  Promote efficiency  Accentuate differences * Face validity - Resemblance to real life situations

Exercises 1 & 2 Needle Driving

 30 targets  4 x 2 inch label

Exercise 1 Needle Driving

 The needle was placed directly through the target and out the sides

Exercise 2 Needle Driving (blind)

 The needle was placed through the sides and out the target

Exercises 1 & 2 Needle Driving

 Metrics recorded  Accuracy of each target  Time (limit 300s) Score = (Red x 7.46)+ (White x 1.95) + Blue - Miss + ((300-time) x (total completed/30))

Accuracy Scoring

Red

Score = (Red x 7.46)+ (White x 1.95) + Blue - Miss + ((300-time) x (total completed/30))

Accuracy Scoring

White

Score = (Red x 7.46)+ (White x 1.95) + Blue - Miss + ((300-time) x (total completed/30))

Exercise 3 Needle Transferring

 30 needles, 3 different sizes  Pick up with forceps, transfer to needle driver, place into sponge  Metrics recorded  Number transferred  Number dropped  Time (limit 150s) Score = (transferred x 2) – dropped + ((150-time) x (needles attempted/30))

Exercise 4 Fine Forceps use

 Threading of beads onto monofilament with forceps  Metrics recorded  Number threaded  Time (limit 150s) Score = Beads Threaded

Exercise 5 Knot Tying

 4 knots  Any type or technique  Metrics recorded  Secure knots in appropriate place  Time (limit 150s) Score = (knots x 10)+ ((150-time) x (total completed/4))

Testing and Scoring

 Forty Volunteers  general surgical residents and attending surgeons  All participants were scored by an evaluator and independently scored themselves  Normalization of scores to the highest score for that exercise  score/high score x 100

Construct Validity

Discrimination between 2 levels of expertise: novice and proficient

Exercises 1 - Needle driving 2 - Needle driving (blind) 3 - Needle transferring 4 - Fine Forceps use Evaluator Scoring Novice (24) Proficient (16) p-value 35 (14) 44 (22) 42 (8) 50 (22) 59 (15) 62 (20) 67 (16) 59 (14) <0.01 0.01 <0.01 0.14 5 - Knot tying 45 (26) 87 (9) <0.01 Values are means (standard deviation). Analysis by Mann-Whitney U test. Novice - Junior residents (Postgraduate year level 1-3) Proficient - Senior residents and attendings (Postgraduate year level 4 and above)

Interrater Reliability

Extent of agreement between self-scoring and scoring by evaluators

Exercises 1 - Needle driving Self scoring 51 (18) 2 - Needle driving (blind) 47 (24) Evaluator scoring Difference p-value 45 (19) 51 (22) 6.8 (12) <0.01 -3.9 (13) 0.07 3 - Needle transferring 4 - Fine Forceps use 52 (17) 54 (19) 52 (17) 54 (19) 0 0 5 - Knot tying 62 (30) 61 (29) 0.6 (4) Values are means (standard deviation). Analysis by paired t-test. 1 1 0.32

Internal Consistency

Correlation of parts of the test with each other

0.9

0.83

0.85

Self-scoring Evaluator scoring Highly reliable value Alpha Coefficien t 0.8

0.7

0.75

0.78

Adequate Value 0.6

Overall Exclusion of Fine Forceps use exercise

Limitations

 Lack of a significant difference in scores for the forceps use exercise may be the result of a type II error  Despite trying to focus on specific components, our exercises likely test multiple skills  Only 5 exercises were formally evaluated

Summary

Develop

low fidelity exercises for the assessment of basic, open surgical skills 

Discriminate

between two levels of expertise establishing construct validity 

Agreement

between raters demonstrating interrater reliability and the ability to self evaluate 

Correlation

between the 5 exercises demonstrating internal consistency  improved with the exclusion of the forceps use exercise

Future Directions

 Establishment of other forms of validity and reliability  Development of other exercises to make a comprehensive set  Demonstrate evidence of improvement with practice  Use of sophisticated technology

Conclusion

 These data provide evidence of validity, reliability and consistency for a series of low fidelity exercises with self-evaluation metrics

Thank you for your time

Current Methods of Assessment

 Operative Logs 1, 2  Faculty Evaluations 2  In-training Examination scores 3 1.

2.

3.

Cuschieri, A., et al.,

What do master surgeons think of surgical competence and revalidation?

Am J Surg, 2001.

182

(2): p. 110-6.

Reznick, R.K.,

Teaching and testing technical skills.

Am J Surg, 1993. 165(3): p. 358-61.

Scott, D.J., et al.,

Evaluating surgical competency with the American Board of Surgery In-Training Examination, skill testing, and intraoperative assessment.

Surgery, 2000. 128(4): p. 613-22.

Definitions

 Face validity  Resemblance to real life situations  Content validity  Domain that is being measured is actually being measured  Concurrent validity  Correlation of results with the gold standard for that domain

Definitions

 Predictive validity  Ability to predict future performance  Test-retest reliability  Consistency of trainee performance on different occasions

Construct Validity

Discrimination between 2 levels of expertise: novice and proficient

Discrimination between 2 levels of exper tise: novice and proficient

Self-scoring Exercises Novice (24) Proficient (16) p-value 1 - Needle drivi ng 46+/-17 60+/-17 0.02 Evalu ator Scoring Novice (24) Proficient (16) p-value 39+/-16 66+/-17 <0.01 2 - Needle drivi ng (blind) 3 - Needle transferring 4 - Fine Forceps use 35+/-15 65+/-23 42+/-8 67+/-16 50+/-22 59+/-14 <0.01 <0.01 0.14 45+/-22 63+/-20 87+/-9 0.01 42+/-8 67+/-16 <0.01 50+/-22 59+/-14 0.14 <0.01 5 - Knot tying 45+/-26 88+/-9 <0.01 45+/-26 Values are means ± standard deviation. An alysis by Mann-Whitney U test.

Internal consistency

0.9

0.8

Cronbach's Alpha Coefficient 0.7

0.6

0.5

Overall 1 2 3 4 Exercise removed 5 Self-scoring Evaluator scoring Highly reliable value Adequate value