Table with relative frequencies and totals’ line

table
dplyr
kable
Authors

Antonio Fidalgo

Miguel Salema

Published

February 13, 2024

About

A table that shows the frequencies and/or relative frequencies of each group. Then, add a line for the total at the bottom of the table.

Species N %
Adelie 152 44.19
Gentoo 124 36.05
Chinstrap 68 19.77
Total 344 100.01
Table 1: An example of the resulting table (format may vary).

Idea

  1. Use {dplyr} to create a table with summary statistics.
  2. Create a data frame with one row, containing the totals.
  3. Bind the extra row to the table of summary statistics.
  4. Print out the table with {kableExtra}.

Packages

Install the following packages [install.packages()] if they are not present in your machine.

library(dplyr)
library(palmerpenguins)
library(kableExtra)
0
For this exercise, we use the penguins data set from the {palmerpenguins} package.

Code

Summary statistics

The key function here is summarise(), in conjunction with group_by(), to first count [n()] how many observations each group has.
Outside the grouping, we create [mutate()] the variable for the relative frequencies. If we did it inside the grouping, then the relative frequency would be 1 for each group.
The ordering [arrange()] in descending order [desc()] is optional.

df <- penguins |>   
  group_by(species) |>
  summarise(n = n())|>
  ungroup() |>
  mutate(rfreq = (n/ sum(n)*100) |> round(2)) |>
  arrange(desc(rfreq))
1
n() is the helper function that counts the number of rows.
2
Put ( ) around the whole expression that you want to pipe into round().
3
Comment out if you don’t want to order by frequency.
df
# A tibble: 3 × 3
  species       n rfreq
  <fct>     <int> <dbl>
1 Adelie      152  44.2
2 Gentoo      124  36.0
3 Chinstrap    68  19.8

Row with totals

We take the object created above and use summarise() again, this time to get the total [sum()] of each variable. It is good practice to make the sum robust to the presence of NAs [na.rm = TRUE].

total_line <- df |>
  summarise(n = sum(n, na.rm = TRUE),
            rfreq = sum(rfreq, na.rm = TRUE),
            species = "Total")
1
You must use the same name for the variable as in the original data set above. The order of the variables does not matter.
total_line
# A tibble: 1 × 3
      n rfreq species
  <int> <dbl> <chr>  
1   344  100. Total  

In one command, I can ask to calculate [summarise()] the sum [sum()] of each variable [across()] that satisfies [where()] is.numeric(). Notice how the formula for each variable [.x] is introduced [~].

total_line <- df |>
  summarise(across(where(is.numeric), ~ sum(.x, na.rm = TRUE)), 
            species = "Total") 

I think the word “Total” is superfluous. I would make that value a blank character.

total_line <- df |>
  summarise(across(where(is.numeric), ~ sum(.x, na.rm = TRUE)), 
            species = "") 

Bind original rows with total rows

The bind_rows() function from {dplyr} will take care of matching the columns by their name.

df <- df |>
  bind_rows(total_line) 
df
# A tibble: 4 × 3
  species       n rfreq
  <chr>     <int> <dbl>
1 Adelie      152  44.2
2 Gentoo      124  36.0
3 Chinstrap    68  19.8
4 Total       344 100. 

\(\LaTeX\)

For printing a table in a pdf document, via \(\LaTeX\), there are a few adjustments needed.
The above must get two changes [booktabs, extra_latex_after].

df |>
  kable(booktabs = TRUE,
        escape = FALSE,  
        col.names = c("Species", 
                      "$N$",
                      "\\%")) |>
  kable_styling(full_width = FALSE) |>  
  column_spec(1, width = "10em") |>
  row_spec(nrow(df) - 1, 
           extra_latex_after = "\\hline")
1
This argument set to TRUE improves the aesthetic quality of tables thanks to a thought-through format.
2
The $ $ environment introduces math symbols. I find it appropriate, here.
3
The % character introduces comments in \(\LaTeX\), a catastrophe in the middle of the code of a table. We must escape it twice [\\].
4
I’m surprised that \(\LaTeX\) recognizes em units. We could go for cm instead.
5
In \(\LaTeX\) format, this is how the package adds \(\LaTeX\) code after a given row.
View pdf version

Unable to display PDF file. Download instead.

Let me suggest three small aesthetic changes:

  • remove the word “Total”, as explained above,
  • let the line cover only selected columns, by changing the extra_latex_after value [\\cline{2-3}]
  • bold the total lines, which is a specification of the last row [row_spec, bold = TRUE].
df |>
  kable(booktabs = TRUE,
        escape = FALSE,  
        col.names = c("Species", 
                      "$N$",
                      "\\%")) |>
  kable_styling(full_width = FALSE) |>  
  column_spec(1, width = "10em") |>
  row_spec(nrow(df) - 1, 
           extra_latex_after = "\\cline{2-3}") |>
  row_spec(nrow(df),
           bold = TRUE)
View pdf version

Unable to display PDF file. Download instead.