Interrater Reliability Calculator

Created:January 2, 2026

Last Updated:January 13, 2026

Calculate interrater reliability and agreement statistics for categorical coding, content analysis, and observational studies. Supports Cohen's Kappa (2 raters), Fleiss' Kappa (3+ raters), and percent agreement. Essential for qualitative research, content analysis, and ensuring coding consistency.

What You'll Get:

Cohen's Kappa: Agreement between 2 raters, corrected for chance
Fleiss' Kappa: Agreement among 3 or more raters
Percent Agreement: Raw agreement rate between raters
Agreement Matrix: Breakdown of agreement by category
Reliability Interpretation: Landis & Koch guidelines for kappa values
Confusion Matrix: Visualize disagreement patterns
APA-Style Report: Publication-ready results

Ready to measure interrater agreement? to see Cohen's Kappa in action with a sentiment analysis example, or upload your own coding data to assess reliability between your raters.

Calculator

1. Load Your Data

2. Select Rater Columns

Select the columns containing each rater's codes. Each column represents one rater's assessments. Rows represent the subjects/items being coded.

Related Calculators

Cronbach's Alpha Calculator

Likert Scale Analysis Calculator

Chi-Square Test of Independence

Contingency Table Calculator

Learn More

Understanding Interrater Reliability

Definition

Interrater reliability (also called inter-observer or inter-coder reliability) measures the degree of agreement between two or more independent raters who code, classify, or rate the same phenomenon. It's essential for establishing the objectivity and consistency of qualitative coding schemes, content analysis, and observational research.

Cohen's Kappa (κ) - For Two Raters

Used for measuring agreement between two raters. Adjusts for chance agreement, providing a more conservative estimate than simple percent agreement.

\kappa = \frac{p_o - p_e}{1 - p_e}

Where p_o = observed agreement, p_e = expected agreement by chance

Fleiss' Kappa (κ) - For Three or More Raters

Extends Cohen's Kappa to three or more raters. Calculates the average pairwise agreement across all rater combinations.

\kappa = \frac{\bar{P} - \bar{P_e}}{1 - \bar{P_e}}

Where P̄ = mean observed agreement, P̄_e = mean expected agreement

Percent Agreement

The simplest measure: the proportion of items on which raters agree. However, it doesn't account for chance agreement and may be misleadingly high when one category is very frequent.

\text{Percent Agreement} = \frac{\text{Number of Agreements}}{\text{Total Number of Items}} \times 100

Interpreting Kappa Values (Landis & Koch, 1977)

< 0.00: Poor agreement (worse than chance)
0.00 - 0.20: Slight agreement
0.21 - 0.40: Fair agreement
0.41 - 0.60: Moderate agreement
0.61 - 0.80: Substantial agreement
0.81 - 1.00: Almost perfect agreement

Note: Different fields may use different thresholds. Some researchers use κ ≥ 0.70 as acceptable, while others require κ ≥ 0.80 for high-stakes applications.

When to Use Each Statistic

Cohen's Kappa: Two raters, nominal/ordinal categories
Fleiss' Kappa: Three or more raters, nominal/ordinal categories
Percent Agreement: Quick screening, but always report with kappa
Weighted Kappa: Ordinal categories where some disagreements are more serious than others

Important Considerations

Each row should represent one subject/item being coded
Each column should represent one rater's codes
Categories can be numeric (1, 2, 3) or text (A, B, C) - both work
All raters must use the same category labels
Missing values are automatically excluded from analysis
Higher kappa values indicate better agreement beyond chance

Common Applications

Content Analysis: Coding themes in text, social media, or interview data
Medical Diagnosis: Agreement on diagnostic classifications or disease staging
Educational Assessment: Multiple graders scoring essays or portfolios
Behavioral Observation: Coding behaviors in video recordings
Survey Research: Validating closed-ended question coding

Practical Example

Qualitative Content Analysis Study

Two researchers independently coded 100 social media posts into three categories: Positive, Neutral, or Negative sentiment.

Agreement on 85 posts
Disagreement on 15 posts
Percent Agreement: 85%
Cohen's Kappa: κ = 0.76

Interpretation: While the raters agreed on 85% of posts, Cohen's Kappa of 0.76 indicates "substantial agreement" after accounting for chance. This is acceptable for publication and suggests the coding scheme is reliable.

Improving Reliability

Develop clear, detailed coding manuals with examples
Provide comprehensive training for all raters before coding
Conduct practice coding sessions with discussion of disagreements
Use pilot testing to refine categories and coding rules
Hold regular check-ins during coding to prevent drift
Calculate reliability on a subset before full coding
Resolve disagreements through discussion or third-party arbitration

Code Examples

R Code Example

Calculate Cohen's Kappa using the irr package in R:

# Install and load irr package
# install.packages("irr")
library(irr)

# Example data: Two raters coding 10 items into 3 categories
rater1 <- c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1)
rater2 <- c(1, 2, 3, 1, 3, 3, 1, 2, 2, 1)

# Combine into a data frame
ratings <- data.frame(rater1, rater2)

# Calculate Cohen's Kappa
kappa_result <- kappa2(ratings)
print(kappa_result)

# Calculate Percent Agreement
agree_result <- agree(ratings)
print(agree_result)

# For 3+ raters (Fleiss' Kappa)
rater3 <- c(1, 2, 3, 2, 2, 3, 1, 2, 3, 1)
ratings_3 <- data.frame(rater1, rater2, rater3)
fleiss_result <- kappam.fleiss(ratings_3)
print(fleiss_result)

Python Code Example

Calculate Cohen's Kappa using scikit-learn:

Python

from sklearn.metrics import cohen_kappa_score
import numpy as np

# Example data: Two raters coding 10 items
rater1 = [1, 2, 3, 1, 2, 3, 1, 2, 3, 1]
rater2 = [1, 2, 3, 1, 3, 3, 1, 2, 2, 1]

# Calculate Cohen's Kappa
kappa = cohen_kappa_score(rater1, rater2)
print(f"Cohen's Kappa: {kappa:.4f}")

# Calculate Percent Agreement
agreements = sum(r1 == r2 for r1, r2 in zip(rater1, rater2))
percent_agreement = (agreements / len(rater1)) * 100
print(f"Percent Agreement: {percent_agreement:.2f}%")

# Interpretation
if kappa < 0:
    interpretation = "Poor"
elif kappa < 0.20:
    interpretation = "Slight"
elif kappa < 0.40:
    interpretation = "Fair"
elif kappa < 0.60:
    interpretation = "Moderate"
elif kappa < 0.80:
    interpretation = "Substantial"
else:
    interpretation = "Almost Perfect"

print(f"Interpretation: {interpretation} agreement")

SPSS Code Example

Calculate Cohen's Kappa in SPSS:

Spss

* Calculate Cohen's Kappa for two raters.
CROSSTABS
  /TABLES=rater1 BY rater2
  /FORMAT=AVALUE TABLES
  /STATISTICS=KAPPA
  /CELLS=COUNT.

* For weighted kappa (ordinal categories).
CROSSTABS
  /TABLES=rater1 BY rater2
  /FORMAT=AVALUE TABLES
  /STATISTICS=KAPPA(1)
  /CELLS=COUNT.

Verification

Interrater Reliability Calculator

Created:January 2, 2026

Last Updated:January 13, 2026

What You'll Get:

Cohen's Kappa: Agreement between 2 raters, corrected for chance
Fleiss' Kappa: Agreement among 3 or more raters
Percent Agreement: Raw agreement rate between raters
Agreement Matrix: Breakdown of agreement by category
Reliability Interpretation: Landis & Koch guidelines for kappa values
Confusion Matrix: Visualize disagreement patterns
APA-Style Report: Publication-ready results

Ready to measure interrater agreement? to see Cohen's Kappa in action with a sentiment analysis example, or upload your own coding data to assess reliability between your raters.

Calculator

1. Load Your Data

2. Select Rater Columns

Select the columns containing each rater's codes. Each column represents one rater's assessments. Rows represent the subjects/items being coded.

Related Calculators

Cronbach's Alpha Calculator

Likert Scale Analysis Calculator

Chi-Square Test of Independence

Contingency Table Calculator

Learn More

Understanding Interrater Reliability

Definition

Cohen's Kappa (κ) - For Two Raters

Used for measuring agreement between two raters. Adjusts for chance agreement, providing a more conservative estimate than simple percent agreement.

\kappa = \frac{p_o - p_e}{1 - p_e}

Where p_o = observed agreement, p_e = expected agreement by chance

Fleiss' Kappa (κ) - For Three or More Raters

Extends Cohen's Kappa to three or more raters. Calculates the average pairwise agreement across all rater combinations.

\kappa = \frac{\bar{P} - \bar{P_e}}{1 - \bar{P_e}}

Where P̄ = mean observed agreement, P̄_e = mean expected agreement

Percent Agreement

The simplest measure: the proportion of items on which raters agree. However, it doesn't account for chance agreement and may be misleadingly high when one category is very frequent.

\text{Percent Agreement} = \frac{\text{Number of Agreements}}{\text{Total Number of Items}} \times 100

Interpreting Kappa Values (Landis & Koch, 1977)

< 0.00: Poor agreement (worse than chance)
0.00 - 0.20: Slight agreement
0.21 - 0.40: Fair agreement
0.41 - 0.60: Moderate agreement
0.61 - 0.80: Substantial agreement
0.81 - 1.00: Almost perfect agreement

Note: Different fields may use different thresholds. Some researchers use κ ≥ 0.70 as acceptable, while others require κ ≥ 0.80 for high-stakes applications.

When to Use Each Statistic

Cohen's Kappa: Two raters, nominal/ordinal categories
Fleiss' Kappa: Three or more raters, nominal/ordinal categories
Percent Agreement: Quick screening, but always report with kappa
Weighted Kappa: Ordinal categories where some disagreements are more serious than others

Important Considerations

Each row should represent one subject/item being coded
Each column should represent one rater's codes
Categories can be numeric (1, 2, 3) or text (A, B, C) - both work
All raters must use the same category labels
Missing values are automatically excluded from analysis
Higher kappa values indicate better agreement beyond chance

Common Applications

Content Analysis: Coding themes in text, social media, or interview data
Medical Diagnosis: Agreement on diagnostic classifications or disease staging
Educational Assessment: Multiple graders scoring essays or portfolios
Behavioral Observation: Coding behaviors in video recordings
Survey Research: Validating closed-ended question coding

Practical Example

Qualitative Content Analysis Study

Two researchers independently coded 100 social media posts into three categories: Positive, Neutral, or Negative sentiment.

Agreement on 85 posts
Disagreement on 15 posts
Percent Agreement: 85%
Cohen's Kappa: κ = 0.76

Improving Reliability

Develop clear, detailed coding manuals with examples
Provide comprehensive training for all raters before coding
Conduct practice coding sessions with discussion of disagreements
Use pilot testing to refine categories and coding rules
Hold regular check-ins during coding to prevent drift
Calculate reliability on a subset before full coding
Resolve disagreements through discussion or third-party arbitration

Code Examples

R Code Example

Calculate Cohen's Kappa using the irr package in R:

# Install and load irr package
# install.packages("irr")
library(irr)

# Example data: Two raters coding 10 items into 3 categories
rater1 <- c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1)
rater2 <- c(1, 2, 3, 1, 3, 3, 1, 2, 2, 1)

# Combine into a data frame
ratings <- data.frame(rater1, rater2)

# Calculate Cohen's Kappa
kappa_result <- kappa2(ratings)
print(kappa_result)

# Calculate Percent Agreement
agree_result <- agree(ratings)
print(agree_result)

# For 3+ raters (Fleiss' Kappa)
rater3 <- c(1, 2, 3, 2, 2, 3, 1, 2, 3, 1)
ratings_3 <- data.frame(rater1, rater2, rater3)
fleiss_result <- kappam.fleiss(ratings_3)
print(fleiss_result)

Python Code Example

Calculate Cohen's Kappa using scikit-learn:

Python

from sklearn.metrics import cohen_kappa_score
import numpy as np

# Example data: Two raters coding 10 items
rater1 = [1, 2, 3, 1, 2, 3, 1, 2, 3, 1]
rater2 = [1, 2, 3, 1, 3, 3, 1, 2, 2, 1]

# Calculate Cohen's Kappa
kappa = cohen_kappa_score(rater1, rater2)
print(f"Cohen's Kappa: {kappa:.4f}")

# Calculate Percent Agreement
agreements = sum(r1 == r2 for r1, r2 in zip(rater1, rater2))
percent_agreement = (agreements / len(rater1)) * 100
print(f"Percent Agreement: {percent_agreement:.2f}%")

# Interpretation
if kappa < 0:
    interpretation = "Poor"
elif kappa < 0.20:
    interpretation = "Slight"
elif kappa < 0.40:
    interpretation = "Fair"
elif kappa < 0.60:
    interpretation = "Moderate"
elif kappa < 0.80:
    interpretation = "Substantial"
else:
    interpretation = "Almost Perfect"

print(f"Interpretation: {interpretation} agreement")

SPSS Code Example

Calculate Cohen's Kappa in SPSS:

Spss

* Calculate Cohen's Kappa for two raters.
CROSSTABS
  /TABLES=rater1 BY rater2
  /FORMAT=AVALUE TABLES
  /STATISTICS=KAPPA
  /CELLS=COUNT.

* For weighted kappa (ordinal categories).
CROSSTABS
  /TABLES=rater1 BY rater2
  /FORMAT=AVALUE TABLES
  /STATISTICS=KAPPA(1)
  /CELLS=COUNT.

Verification

# Install and load irr package # install.packages("irr") library(irr) # Example data: Two raters coding 10 items into 3 categories rater1 <- c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1) rater2 <- c(1, 2, 3, 1, 3, 3, 1, 2, 2, 1) # Combine into a data frame ratings <- data.frame(rater1, rater2) # Calculate Cohen's Kappa kappa_result <- kappa2(ratings) print(kappa_result) # Calculate Percent Agreement agree_result <- agree(ratings) print(agree_result) # For 3+ raters (Fleiss' Kappa) rater3 <- c(1, 2, 3, 2, 2, 3, 1, 2, 3, 1) ratings_3 <- data.frame(rater1, rater2, rater3) fleiss_result <- kappam.fleiss(ratings_3) print(fleiss_result)

from sklearn.metrics import cohen_kappa_score import numpy as np # Example data: Two raters coding 10 items rater1 = [1, 2, 3, 1, 2, 3, 1, 2, 3, 1] rater2 = [1, 2, 3, 1, 3, 3, 1, 2, 2, 1] # Calculate Cohen's Kappa kappa = cohen_kappa_score(rater1, rater2) print(f"Cohen's Kappa: {kappa:.4f}") # Calculate Percent Agreement agreements = sum(r1 == r2 for r1, r2 in zip(rater1, rater2)) percent_agreement = (agreements / len(rater1)) * 100 print(f"Percent Agreement: {percent_agreement:.2f}%") # Interpretation if kappa < 0: interpretation = "Poor" elif kappa < 0.20: interpretation = "Slight" elif kappa < 0.40: interpretation = "Fair" elif kappa < 0.60: interpretation = "Moderate" elif kappa < 0.80: interpretation = "Substantial" else: interpretation = "Almost Perfect" print(f"Interpretation: {interpretation} agreement")

* Calculate Cohen's Kappa for two raters. CROSSTABS /TABLES=rater1 BY rater2 /FORMAT=AVALUE TABLES /STATISTICS=KAPPA /CELLS=COUNT. * For weighted kappa (ordinal categories). CROSSTABS /TABLES=rater1 BY rater2 /FORMAT=AVALUE TABLES /STATISTICS=KAPPA(1) /CELLS=COUNT.

Interrater Reliability Calculator

What You'll Get:

Calculator

1. Load Your Data

2. Select Rater Columns

Related Calculators

Cronbach's Alpha Calculator

Likert Scale Analysis Calculator

Chi-Square Test of Independence

Contingency Table Calculator

Learn More

Understanding Interrater Reliability

Definition

Cohen's Kappa (κ) - For Two Raters

Fleiss' Kappa (κ) - For Three or More Raters

Percent Agreement

Interpreting Kappa Values (Landis & Koch, 1977)

When to Use Each Statistic

Important Considerations

Common Applications

Practical Example

Improving Reliability

Code Examples

R Code Example

Python Code Example

SPSS Code Example

Verification

View Verification Details

Interrater Reliability Calculator

What You'll Get:

Calculator

1. Load Your Data

2. Select Rater Columns

Related Calculators

Cronbach's Alpha Calculator

Likert Scale Analysis Calculator

Chi-Square Test of Independence

Contingency Table Calculator

Learn More

Understanding Interrater Reliability

Definition

Cohen's Kappa (κ) - For Two Raters

Fleiss' Kappa (κ) - For Three or More Raters

Percent Agreement

Interpreting Kappa Values (Landis & Koch, 1977)

When to Use Each Statistic

Important Considerations

Common Applications

Practical Example

Improving Reliability

Code Examples

R Code Example

Python Code Example

SPSS Code Example

Verification

View Verification Details

Interrater Reliability Calculator

What You'll Get:

Calculator

1. Load Your Data

2. Select Rater Columns

Related Calculators

Cronbach's Alpha Calculator

Likert Scale Analysis Calculator

Chi-Square Test of Independence

Contingency Table Calculator

Learn More

Understanding Interrater Reliability

Definition

Cohen's Kappa (κ) - For Two Raters

Fleiss' Kappa (κ) - For Three or More Raters

Percent Agreement

Interpreting Kappa Values (Landis & Koch, 1977)

When to Use Each Statistic

Important Considerations

Common Applications

Practical Example

Improving Reliability

Code Examples

R Code Example