Calculating Proportion with N

One of the more typical workflows I need to perform is calculating counts within groups. The dplyr group_by() %>% summarise() pipeline (or just count()) can easily get aggregated tallies. But occasionally it’s helpful to get count within group along with the proportion overall.

Here’s one way to do that:

library(dplyr)

# use the starwars dataset
# one row per character
# can be grouped by character species
starwars
## # A tibble: 87 x 13
##    name  height  mass hair_color skin_color eye_color birth_year gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
##  1 Luke…    172    77 blond      fair       blue            19   male  
##  2 C-3PO    167    75 <NA>       gold       yellow         112   <NA>  
##  3 R2-D2     96    32 <NA>       white, bl… red             33   <NA>  
##  4 Dart…    202   136 none       white      yellow          41.9 male  
##  5 Leia…    150    49 brown      light      brown           19   female
##  6 Owen…    178   120 brown, gr… light      blue            52   male  
##  7 Beru…    165    75 brown      light      blue            47   female
##  8 R5-D4     97    32 <NA>       white, red red             NA   <NA>  
##  9 Bigg…    183    84 black      light      brown           24   male  
## 10 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  
## # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>
starwars %>%
  # get the count in category with count()
  count(species, name = "species_n", sort = TRUE) %>%
  # add a column that gives you total of all ...
  # calculate proportion ... 
  # format n (%)
  # keep in mind you won't be able to sort after formatting as character
  mutate(n = sum(species_n),
         prop = round(species_n/n * 100, digits = 0),
         prop_with_n = paste0(species_n, " (", prop, "%)")) %>%
  # return only columns of interest
  select(species, prop_with_n)
## # A tibble: 38 x 2
##    species  prop_with_n
##    <chr>    <chr>      
##  1 Human    35 (40%)   
##  2 Droid    5 (6%)     
##  3 <NA>     5 (6%)     
##  4 Gungan   3 (3%)     
##  5 Kaminoan 2 (2%)     
##  6 Mirialan 2 (2%)     
##  7 Twi'lek  2 (2%)     
##  8 Wookiee  2 (2%)     
##  9 Zabrak   2 (2%)     
## 10 Aleena   1 (1%)     
## # … with 28 more rows

Related