A. Fidalgo - Table with relative frequencies and totals’ line

About

A table that shows the frequencies and/or relative frequencies of each group. Then, add a line for the total at the bottom of the table.

Species	N	%
Adelie	152	44.19
Gentoo	124	36.05
Chinstrap	68	19.77
Total	344	100.01

Table 1: An example of the resulting table (format may vary).

Idea

Use {dplyr} to create a table with summary statistics.
Create a data frame with one row, containing the totals.
Bind the extra row to the table of summary statistics.
Print out the table with {kableExtra}.

Packages

Install the following packages [install.packages()] if they are not present in your machine.

library(dplyr)
library(palmerpenguins)
library(kableExtra)

0: For this exercise, we use the penguins data set from the {palmerpenguins} package.

Code

Summary statistics

The key function here is summarise(), in conjunction with group_by(), to first count [n()] how many observations each group has.
Outside the grouping, we create [mutate()] the variable for the relative frequencies. If we did it inside the grouping, then the relative frequency would be 1 for each group.
The ordering [arrange()] in descending order [desc()] is optional.

df <- penguins |>   
  group_by(species) |>
  summarise(n = n())|>
  ungroup() |>
  mutate(rfreq = (n/ sum(n)*100) |> round(2)) |>
  arrange(desc(rfreq))

1: n() is the helper function that counts the number of rows.
2: Put ( ) around the whole expression that you want to pipe into round().
3: Comment out if you don’t want to order by frequency.

df

# A tibble: 3 × 3
  species       n rfreq
  <fct>     <int> <dbl>
1 Adelie      152  44.2
2 Gentoo      124  36.0
3 Chinstrap    68  19.8

Row with totals

We take the object created above and use summarise() again, this time to get the total [sum()] of each variable. It is good practice to make the sum robust to the presence of NAs [na.rm = TRUE].

total_line <- df |>
  summarise(n = sum(n, na.rm = TRUE),
            rfreq = sum(rfreq, na.rm = TRUE),
            species = "Total")

1: You must use the same name for the variable as in the original data set above. The order of the variables does not matter.

total_line

# A tibble: 1 × 3
      n rfreq species
  <int> <dbl> <chr>  
1   344  100. Total

Advanced R

In one command, I can ask to calculate [summarise()] the sum [sum()] of each variable [across()] that satisfies [where()] is.numeric(). Notice how the formula for each variable [.x] is introduced [~].

total_line <- df |>
  summarise(across(where(is.numeric), ~ sum(.x, na.rm = TRUE)), 
            species = "Total")

Aesthetics

I think the word “Total” is superfluous. I would make that value a blank character.

total_line <- df |>
  summarise(across(where(is.numeric), ~ sum(.x, na.rm = TRUE)), 
            species = "")

Bind original rows with total rows

The bind_rows() function from {dplyr} will take care of matching the columns by their name.

df <- df |>
  bind_rows(total_line)

df

# A tibble: 4 × 3
  species       n rfreq
  <chr>     <int> <dbl>
1 Adelie      152  44.2
2 Gentoo      124  36.0
3 Chinstrap    68  19.8
4 Total       344 100.

Print out a pretty table

We use the function kable() from {kableExtra}.
Here, we change the names [colnames =] and style the table [full_width = FALSE, ]. Importantly, we add a line below the penultimate row.

df|>
  kable(table.attr = 'data-quarto-disable-processing="true"',
        escape = FALSE,
        col.names = c("Species",
                      "N",
                      "%")) |>
  kable_styling(full_width = FALSE) |>
  column_spec(1, width = "10em") |>
  row_spec(nrow(df) - 1,
           extra_css = "border-bottom: 1px solid")

1: A weird required argument when using Quarto.
2: If escape = FALSE, we can better control the possible special characters.
3: Give a list of names for the variables, in the correct order.
4: full_width is a self-explanatory argument that can be used with other functions. kable_styling() styles the table with a variety of arguments.
5: Increase the width of column 1 [column_spec()].
6: Since df is already in the environment, we can write thus the penultimate row of df.
7: In HTML format, this is how the package adds a line using a popular styling language [extra_css].

Species	N	%
Adelie	152	44.19
Gentoo	124	36.05
Chinstrap	68	19.77
Total	344	100.01

$\LaTeX$

For printing a table in a pdf document, via $\LaTeX$, there are a few adjustments needed.
The above must get two changes [booktabs, extra_latex_after].

df |>
  kable(booktabs = TRUE,
        escape = FALSE,  
        col.names = c("Species", 
                      "$N$",
                      "\\%")) |>
  kable_styling(full_width = FALSE) |>  
  column_spec(1, width = "10em") |>
  row_spec(nrow(df) - 1, 
           extra_latex_after = "\\hline")

1: This argument set to TRUE improves the aesthetic quality of tables thanks to a thought-through format.
2: The $ $ environment introduces math symbols. I find it appropriate, here.
3: The % character introduces comments in $\LaTeX$, a catastrophe in the middle of the code of a table. We must escape it twice [\\].
4: I’m surprised that $\LaTeX$ recognizes em units. We could go for cm instead.
5: In $\LaTeX$ format, this is how the package adds $\LaTeX$ code after a given row.

View pdf version