A few words about this paper…
A historically significant paper I have to say, and although not yet cited it formed the basis of what I eventually pursued for the last 10 years in the spin-off company, Qpercom. According to the Irish Times, we are “dragging exam assessment out of the dark ages“ (Oct. 2016). This suppressed paper actually forms the basis of what Qpercom has worked to achieve since 2008 with client partners worldwide. From being a PT clinician by training, I moved into medical education. As clinical researchers, we put a lot of effort into developing the Smallest Detectable Difference (SDD), to be detected using a ‘ruler’. Measuring maximal mouth opening with a metal ruler is one of the outcome variables in patients with maxillofacial pain. With the newly acquired evidence that you had to measure at least 12 mm difference in mouth opening before and after an intervention to be successful in patients with temporomandibular joint disc displacement, I changed jobs and moved into medical/dental education. I was immediately challenged to look into comparable measurements used in oral hygiene training. Probing depth measurements were used as an example to demonstrate the use of generalisability and decision studies in educational decision-making. Fourteen years after this publication, we are comparing 10 different European Universities on their quality assurance outcome of OSCEs. Have a read, use the evidence, and I hope this will help students and staff measuring any kind of assessment outcome, plus this historically significant paper needs citations!
Thomas JB Kropmans and Katarina Jerkovic-Cosic
Generalisability coefficients are widely used as a measure of reliability in educational sciences. However, coefficients do not inform about how much individual and/or educational variance is needed for borderline competence. Probing depth measurements were used as an example to demonstrate the use of generalisability and decision studies in educational decision-making.
All six probing depths of seven elements of the first quadrant were assessed in a random crossed four-way ‘elements’ (7), ‘probing depth’ (6) ‘observers’ (7-8) and ‘repetitions’ design for both students (8) and faculty staff (7). Borderline competence was assessed in four decision study designs.
In decision studies the generalisability coefficient varied from 0.31 to 0.88 for the first single and fourth repeated study design, respectively. The standard error of the measurement varied from 1.73 to 0.64 mm with a corresponding 95% confidence interval varying from 3 to 1 mm.
Repeated measurement is not the most efficient way to improve educational decision-making. What is new to this approach is that we used the 95% confidence interval of the standard error of measurement as a measure for ‘borderline competence’ expressed in the unity of the measurement instrument. Correctly performing the procedure of probing depth assessment is of clinical and educational importance in deciding whether further treatment is necessary and whether students fail or pass their exams.
Click below for article full text…