When you use questions that rely on your participants making judgments in your survey, your study mustn’t be swayed by individual biases that could affect the results. This is where inter-rater reliability comes into play.
For Example…
Say you’re watching an ice-skating show where you need to rate the contestant on their ability from a scale of 1-10. Impressed, you rate their performance a full 10/10, but your two friends who also spectated gave a 6/10 and 2/10. With such great differences in assessment, the inter-rater reliability of your 1-10 scale is rather low. However, if you specify what a high rating entails, such as a certain number of triple lutzes, then your inter-rater reliability is likely to increase.
This is a good thing to look out for when you run a pilot test of your survey, so you know where you need to add more clarifying information about how to make judgment calls. Common measures of inter-rater reliability include Cronbach’s alpha and Cohen’s kappa.