ROC Curves in R

A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.1

There is plenty of available information on how to plot ROC curves in R:

A 2019 RViews post compares different methods side-by-side:

https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/

The example that follows provides a documented method I have used to plot ROC curves, both with the pROC package alone … and also using data from the pROC ROC AUC object and ggplot2.

First, some code for to prepare the data (the Titanic dataset in this case) for modeling:

library(dplyr)

expand_counts <- function(.data, freq_col) {
  
  quo_freq <- dplyr::enquo(freq_col)
  
  freqs <- dplyr::pull(.data, !!quo_freq)
  
  ind <- rep(seq_len(nrow(.data)), freqs)
  
  # Drop count column
  .data <- dplyr::select(.data, - !!quo_freq)
  
  # Get the rows from x
  .data[ind, ]
  
}

titanic <-
  as.data.frame(Titanic, stringsAsFactors = FALSE) %>%
  expand_counts(Freq) %>%
  mutate(Survived = ifelse(Survived == "Yes", 1, 0))

as.data.frame(Titanic)
##    Class    Sex   Age Survived Freq
## 1    1st   Male Child       No    0
## 2    2nd   Male Child       No    0
## 3    3rd   Male Child       No   35
## 4   Crew   Male Child       No    0
## 5    1st Female Child       No    0
## 6    2nd Female Child       No    0
## 7    3rd Female Child       No   17
## 8   Crew Female Child       No    0
## 9    1st   Male Adult       No  118
## 10   2nd   Male Adult       No  154
## 11   3rd   Male Adult       No  387
## 12  Crew   Male Adult       No  670
## 13   1st Female Adult       No    4
## 14   2nd Female Adult       No   13
## 15   3rd Female Adult       No   89
## 16  Crew Female Adult       No    3
## 17   1st   Male Child      Yes    5
## 18   2nd   Male Child      Yes   11
## 19   3rd   Male Child      Yes   13
## 20  Crew   Male Child      Yes    0
## 21   1st Female Child      Yes    1
## 22   2nd Female Child      Yes   13
## 23   3rd Female Child      Yes   14
## 24  Crew Female Child      Yes    0
## 25   1st   Male Adult      Yes   57
## 26   2nd   Male Adult      Yes   14
## 27   3rd   Male Adult      Yes   75
## 28  Crew   Male Adult      Yes  192
## 29   1st Female Adult      Yes  140
## 30   2nd Female Adult      Yes   80
## 31   3rd Female Adult      Yes   76
## 32  Crew Female Adult      Yes   20
sample_n(titanic, 10)
##    Class    Sex   Age Survived
## 1    3rd   Male Adult        0
## 2   Crew   Male Adult        0
## 3   Crew   Male Adult        0
## 4   Crew   Male Adult        0
## 5    2nd Female Child        1
## 6   Crew   Male Adult        0
## 7    1st   Male Adult        0
## 8   Crew   Male Adult        0
## 9    1st Female Adult        1
## 10   3rd   Male Adult        1

The model will predict survival (yes/no) from the Titanic. Predictors will include class, sex, and age. We’ll look at a model of with passenger class as the only predictor versus a model that includes class, sex, and age.

Survived ~ Class

library(pROC)

fit1 <- glm(Survived ~ Class, data = titanic, family = binomial)

prob <- predict(fit1,type=c("response"))

fit1$prob <- prob

roc1 <- roc(Survived ~ prob, data = titanic, plot = FALSE)

roc1
## 
## Call:
## roc.formula(formula = Survived ~ prob, data = titanic, plot = FALSE)
## 
## Data: prob in 1490 controls (Survived 0) < 711 cases (Survived 1).
## Area under the curve: 0.6417

Survived ~ Class + Sex + Age

fit2 <- glm(Survived ~ Class + Sex + Age, data = titanic, family = binomial)

prob <- predict(fit2,type=c("response"))

fit2$prob <- prob

roc2 <- roc(Survived ~ prob, data = titanic, plot = FALSE)

roc2
## 
## Call:
## roc.formula(formula = Survived ~ prob, data = titanic, plot = FALSE)
## 
## Data: prob in 1490 controls (Survived 0) < 711 cases (Survived 1).
## Area under the curve: 0.7597
plot(roc1, lty = "solid")
lines(roc2, lty = "dotted")

library(ggplot2)

df1 <- 
  data_frame(
    sensitivity = roc1$sensitivities,
    specificity = roc1$specificities,
    thresholds = roc1$thresholds,
    model = "Survived ~ Class"
  )

df2 <- 
  data_frame(
    sensitivity = roc2$sensitivities,
    specificity = roc2$specificities,
    thresholds = roc2$thresholds,
    model = "Survived ~ Class + Sex + Age"
  )

rbind(df1,df2) %>%
  ggplot(aes(1-specificity, sensitivity)) +
  geom_line(aes(group = model, lty = model)) +
  theme(legend.position = "bottom", 
        legend.title = element_blank())

Related