The function summaryTable() produces a table with
descriptive statistics for continuous, categorical and dichotomous
variables. It is based on the function
gtsummary::tbl_summary(), with several enhancements and
simplifications, such as
- Simplified syntax for easier and more intuitive use.
- Display of missing values for categorical variables: Option to show (or not) the percentage of missing values next to the count.
- Columns with the number of non-missing observations can be added for each group
Setup and data
To demonstrate the various functionalities of the function we will
use the dataset survival::colon.
library(survival)
data(cancer, package="survival")
colon1 <- colon %>%
group_by(id) %>%
slice(1) %>% # Select the first row within each id group
ungroup()
The dataset colon contains data of 1858 patients from
one of the first successful trials of adjuvant chemotherapy for colon
cancer.
For simplicity, we focus here on recurrence only, two treatment groups, and four variable:
- the treatment group (
rx), - the sex (
Male), - the age (
age) and - the extent of local spread (
extent).
We also add a few missing values for the variable
extent.
set.seed(123)
colon2 <- colon1 %>%
select(rx, sex, age, extent) %>%
filter(rx != "Lev") %>%
mutate(rx = if_else(rx == "Obs", "Control", rx),
extent = if_else(row_number() %in% sample(row_number(), size = round(0.1 * n())), NA, extent)) %>%
rename(Male = sex) %>%
mutate(extent = as.factor(extent))
head(colon2)
#> # A tibble: 6 × 4
#> rx Male age extent
#> <chr> <dbl> <dbl> <fct>
#> 1 Lev+5FU 1 43 3
#> 2 Lev+5FU 1 63 3
#> 3 Control 0 71 2
#> 4 Lev+5FU 0 66 3
#> 5 Control 1 69 3
#> 6 Lev+5FU 0 57 3Simple table
By default, the function produces a table with all variables present in the dataset.
summaryTable(data = colon2)Characteristic |
N |
N = 6191 |
|---|---|---|
rx |
619 |
|
Control |
315 (51%) |
|
Lev+5FU |
304 (49%) |
|
Male |
619 |
|
0 |
312 (50%) |
|
1 |
307 (50%) |
|
age |
619 |
61.0 (18.0, 85.0) |
extent |
557 |
|
1 |
17 (3%) |
|
2 |
68 (11%) |
|
3 |
446 (72%) |
|
4 |
26 (4%) |
|
Missing |
62 (10%) |
|
1n (%); Median (Min, Max) | ||
If only specific variables are to be included, they need to be
entered in the argument vars. The argument
group allows the summary statistics to be stratified by
this variable.
summaryTable(data = colon2,
vars = c("Male", "age", "extent"),
group = "rx")Characteristic |
N1 |
Control |
N1 |
Lev+5FU |
|---|---|---|---|---|
Male |
315 |
304 |
||
0 |
149 (47%) |
163 (54%) |
||
1 |
166 (53%) |
141 (46%) |
||
age |
315 |
60.0 (18.0, 85.0) |
304 |
62.0 (26.0, 81.0) |
extent |
285 |
272 |
||
1 |
8 (3%) |
9 (3%) |
||
2 |
38 (12%) |
30 (10%) |
||
3 |
222 (70%) |
224 (74%) |
||
4 |
17 (5%) |
9 (3%) |
||
Missing |
30 (10%) |
32 (11%) |
||
1N without missing values | ||||
2n (%); Median (Min, Max) | ||||
Displayed name of variables
The displayed name of each variable is
the label if it exists in the dataset, or
the variable name if no label is present in the dataset (which is the case in our example).
In order to customize the displayed name, the argument
labels can be used. Please note that the labels need to be
entered as a list, as shown below:
summaryTable(data = colon2,
group = "rx",
labels = list(age = "Age", extent = "Extent"))Characteristic |
N1 |
Control |
N1 |
Lev+5FU |
|---|---|---|---|---|
Male |
315 |
304 |
||
0 |
149 (47%) |
163 (54%) |
||
1 |
166 (53%) |
141 (46%) |
||
Age |
315 |
60.0 (18.0, 85.0) |
304 |
62.0 (26.0, 81.0) |
Extent |
285 |
272 |
||
1 |
8 (3%) |
9 (3%) |
||
2 |
38 (12%) |
30 (10%) |
||
3 |
222 (70%) |
224 (74%) |
||
4 |
17 (5%) |
9 (3%) |
||
Missing |
30 (10%) |
32 (11%) |
||
1N without missing values | ||||
2n (%); Median (Min, Max) | ||||
Adding number of observations
The number of observations which are not missing
values are by default added in a new column. This can be
disabled by setting the argument add_n to
FALSE.
summaryTable(data = colon2,
group = "rx",
labels = list(rx = "Arm", age = "Age", extent = "Extent"),
add_n = FALSE)Characteristic |
Control |
Lev+5FU |
|---|---|---|
Male |
||
0 |
149 (47%) |
163 (54%) |
1 |
166 (53%) |
141 (46%) |
Age |
60.0 (18.0, 85.0) |
62.0 (26.0, 81.0) |
Extent |
||
1 |
8 (3%) |
9 (3%) |
2 |
38 (12%) |
30 (10%) |
3 |
222 (70%) |
224 (74%) |
4 |
17 (5%) |
9 (3%) |
Missing |
30 (10%) |
32 (11%) |
1n (%); Median (Min, Max) | ||
Overall column
An “overall” column can be added by setting the argument
overall to TRUE.
summaryTable(data = colon2,
group = "rx",
overall = TRUE,
labels = list(age = "Age", extent = "Extent"))Characteristic |
N1 |
Control |
N1 |
Lev+5FU |
N1 |
Overall |
|---|---|---|---|---|---|---|
Male |
315 |
304 |
619 |
|||
0 |
149 (47%) |
163 (54%) |
312 (50%) |
|||
1 |
166 (53%) |
141 (46%) |
307 (50%) |
|||
Age |
315 |
60.0 (18.0, 85.0) |
304 |
62.0 (26.0, 81.0) |
619 |
61.0 (18.0, 85.0) |
Extent |
285 |
272 |
619 |
|||
1 |
8 (3%) |
9 (3%) |
17 (3%) |
|||
2 |
38 (12%) |
30 (10%) |
68 (11%) |
|||
3 |
222 (70%) |
224 (74%) |
446 (72%) |
|||
4 |
17 (5%) |
9 (3%) |
26 (4%) |
|||
Missing |
30 (10%) |
32 (11%) |
62 (10%) |
|||
1N without missing values | ||||||
2n (%); Median (Min, Max) | ||||||
Variable types
The function gtsummary::tbl_summary considers numeric
variables with fewer than 10 unique values as categorical by default.
This is not the case in the function summaryTable.
Per default, all numeric variables are considered as continuous,
unless they only have two unique values: 0 and 1. In that case, they are
considered as dichotomous. This can be changed by setting the argument
continuous_as to categorical.
For dichotomous variables, all levels are displayed by default. To
show only one row, use the argument
dichotomous_as = dichotomous. The reference level is
specified using the argument
value = list(variable ~ "level to show").
summaryTable(data = colon2,
group = "rx",
vars = "Male",
labels = list(age = "Age"),
dichotomous_as = "dichotomous",
value = list(Male ~ "1"),
missing = FALSE)Characteristic |
N1 |
Control |
N1 |
Lev+5FU |
|---|---|---|---|---|
Male |
315 |
166 (53%) |
304 |
141 (46%) |
1N without missing values | ||||
2n (%) | ||||
By default, the function plots the median and range for continuous
variables. A number of other options are available, using the argument
stat_cont.
Statistic type
The statistics to be displayed can be chosen using the argument
stat_cont (options: median_IQR,
median_range (default), "mean_sd",
"mean_se" and "geomMean_sd") and
stat_cat (options: "n_percent" (default)
"n" and "n_N").
summaryTable(data = colon2, group = "rx",
stat_cont = "median_IQR",
stat_cat = "n_N",
labels = list(age = "Age", sex = "Sex", extent = "Extent"))Characteristic |
N1 |
Control |
N1 |
Lev+5FU |
|---|---|---|---|---|
Male |
315 |
304 |
||
0 |
149/315 |
163/304 |
||
1 |
166/315 |
141/304 |
||
Age |
315 |
60.0 (53.0, 68.0) |
304 |
62.0 (52.0, 70.0) |
Extent |
285 |
272 |
||
1 |
8/315 |
9/304 |
||
2 |
38/315 |
30/304 |
||
3 |
222/315 |
224/304 |
||
4 |
17/315 |
9/304 |
||
Missing |
30/315 |
32/304 |
||
1N without missing values | ||||
2n/N; Median (Q1, Q3) | ||||
Tests
By default, no p-value and confidence (CI) are displayed. p-values
can be added by setting test to TRUE and CI by
setting ci to TRUE.
The default test type for continuous variable is
wilcox.test, and fisher.test for categorical
variables. This can be changed in test_cont and
test_cat, respectively.
The default CI type for continuous variables is
wilcox.test and wilson for categorical
variables. This can be changed in ci_cont and
ci_cat, respectively.
summaryTable(data = colon2,
group = "rx",
vars = c("age", "extent"),
stat_cont = "mean_sd",
test = TRUE,
ci = TRUE,
labels = list(age = "Age", extent = "Extent")
)
#> The number rows in the tables to be merged do not match, which may result in
#> rows appearing out of order.
#> ℹ See `tbl_merge()` (`?gtsummary::tbl_merge()`) help file for details. Use
#> `quiet=TRUE` to silence message.Characteristic |
N1 |
Control |
95% CI |
N1 |
Lev+5FU |
95% CI |
p-value3 |
|---|---|---|---|---|---|---|---|
Age |
315 |
59.5 (12.0) |
[59, 62] |
304 |
59.7 (12.3) |
[59, 62] |
0.60 |
Extent |
285 |
272 |
0.37 |
||||
1 |
8 (3%) |
[1.2%, 5.1%] |
9 (3%) |
[1.5%, 5.7%] |
|||
2 |
38 (12%) |
[8.8%, 16%] |
30 (10%) |
[6.9%, 14%] |
|||
3 |
222 (70%) |
[65%, 75%] |
224 (74%) |
[68%, 78%] |
|||
4 |
17 (5%) |
[3.3%, 8.7%] |
9 (3%) |
[1.5%, 5.7%] |
|||
Missing |
30 (10%) |
[6.6%, 13%] |
32 (11%) |
[7.4%, 15%] |
|||
1N without missing values | |||||||
2Mean (SD); n (%) | |||||||
3Wilcoxon rank sum test; Fisher's exact test | |||||||
Abbreviation: CI = Confidence Interval | |||||||
Missing values
Per default, missing values are shown as a separate category. This
can be disabled by setting missing to
FALSE.
For missing = TRUE, the percentage are automatically
added next to the missing number. This can be disabled by setting the
argument missing_percentage to FALSE.
summaryTable(data = colon2,
group = "rx",
vars = "extent",
test = TRUE,
ci = TRUE,
missing_percent = FALSE,
labels = list(extent = "Extent")
)Characteristic |
N1 |
Control |
95% CI |
N1 |
Lev+5FU |
95% CI |
p-value3 |
|---|---|---|---|---|---|---|---|
Extent |
285 |
272 |
0.37 |
||||
1 |
8 (3%) |
[1.3%, 5.7%] |
9 (3%) |
[1.6%, 6.4%] |
|||
2 |
38 (13%) |
[9.7%, 18%] |
30 (11%) |
[7.7%, 16%] |
|||
3 |
222 (78%) |
[73%, 82%] |
224 (82%) |
[77%, 87%] |
|||
4 |
17 (6%) |
[3.6%, 9.6%] |
9 (3%) |
[1.6%, 6.4%] |
|||
Missing |
30 |
32 |
|||||
1N without missing values | |||||||
2n (%) | |||||||
3Fisher's exact test | |||||||
Abbreviation: CI = Confidence Interval | |||||||
summaryTable(data = colon2,
group = "rx",
vars = "extent",
test = TRUE,
ci = TRUE,
missing_percent = TRUE,
labels = list(extent = "Extent")
)
#> The number rows in the tables to be merged do not match, which may result in
#> rows appearing out of order.
#> ℹ See `tbl_merge()` (`?gtsummary::tbl_merge()`) help file for details. Use
#> `quiet=TRUE` to silence message.Characteristic |
N1 |
Control |
95% CI |
N1 |
Lev+5FU |
95% CI |
p-value3 |
|---|---|---|---|---|---|---|---|
Extent |
285 |
272 |
0.37 |
||||
1 |
8 (3%) |
[1.2%, 5.1%] |
9 (3%) |
[1.5%, 5.7%] |
|||
2 |
38 (12%) |
[8.8%, 16%] |
30 (10%) |
[6.9%, 14%] |
|||
3 |
222 (70%) |
[65%, 75%] |
224 (74%) |
[68%, 78%] |
|||
4 |
17 (5%) |
[3.3%, 8.7%] |
9 (3%) |
[1.5%, 5.7%] |
|||
Missing |
30 (10%) |
[6.6%, 13%] |
32 (11%) |
[7.4%, 15%] |
|||
1N without missing values | |||||||
2n (%) | |||||||
3Fisher's exact test | |||||||
Abbreviation: CI = Confidence Interval | |||||||
The tables with and without missing values can also be put next to
each other by setting missing to "both".
summaryTable(data = colon2,
group = "rx",
vars = "extent",
missing_percent = "both",
test = TRUE,
labels = list(extent = "Extent")
)
#> The number rows in the tables to be merged do not match, which may result in
#> rows appearing out of order.
#> ℹ See `tbl_merge()` (`?gtsummary::tbl_merge()`) help file for details. Use
#> `quiet=TRUE` to silence message.
#> The number rows in the tables to be merged do not match, which may result in
#> rows appearing out of order.
#> ℹ See `tbl_merge()` (`?gtsummary::tbl_merge()`) help file for details. Use
#> `quiet=TRUE` to silence message.
|
With missing |
Without missing |
|||
|---|---|---|---|---|---|
Characteristic |
Control |
Lev+5FU |
Control |
Lev+5FU |
p-value2 |
Extent |
0.37 |
||||
1 |
8 (3%) |
9 (3%) |
8 (3%) |
9 (3%) |
|
2 |
38 (12%) |
30 (10%) |
38 (13%) |
30 (11%) |
|
3 |
222 (70%) |
224 (74%) |
222 (78%) |
224 (82%) |
|
4 |
17 (5%) |
9 (3%) |
17 (6%) |
9 (3%) |
|
Missing |
30 (10%) |
32 (11%) |
|||
1n (%) | |||||
2Fisher's exact test | |||||