The Standard Error of Measurement (SEM) indicates the amount of error around the observed score. The observed score, the score we retrieve, store and analyse from an OSCE, is in fact the result of the true score and error around this true score. If we want a reliable decision around passing or failing a station e.g. an OSCE, we need to incorporate the SEM in that decision.
Observed Score is the true ability (true score) of the student plus the random error around that true score. The error is associated with the reliability or internal consistency of score sheets used in OSCEs. Within our system, Qpercom calculates Cronbach’s alpha as a reliability score indicating how consistent scores are being measured, and the Intra Class Correlation coefficient; how reliable are scores between the different stations (Silva et al., 2017). These classical psychometric measures of the data can be used to calculate the SEM. An observed score +/- the SEM means that with 68% certainty the ‘true score’ of that station is somewhere in between the actual score, plus or minus the SEM. In principle, one should consider plus or minus the 95% Confidence Interval, which is the Observed score plus or minus 1.96 * SEM (Zimmerman & Williams, 1966).
Borderline regression analysis (BRA) is an absolute, examinee-centered standard setting method widely used to standard set OSCE exams, Yousuf, Violato, and Zuberi (2015). Candidates are awarded a “global score” for a station in a circuit, based on the examiner’s professional judgment of their ability.
Borderline Regression Method is illustrated above using item score on the Y-axis and Global Ratings on the X-axis. 0=Fail, 1=Borderline, 2=Pass, 3=Good and 4=Excellent.
What are Global Rating Scores?
In clinical skills or observational assessment, apart from item score sheets, Global Rating Scores (GRS) are used. A GRS reflects the professional opinion of the examiner once they have completed the item score list. This list represents what they have observed or marked during the scenario. In most cases, a 5 or 6 item Likert scale is used ranging from 0 = Fail; 1 = Borderline; 2 = Pass; 3 = Good to 4 = Excellent. Sometimes, if examiners can’t choose between Borderline and Fail or Borderline and Pass an extra option is brought into this GRS. Now examiners have the option to choose between 0 = Fail; 1 = Borderline fail; 2 = Borderline pass; 3 = Pass and 4 and 5 are Good and Excellent, respectively.