pivot_wider()
"widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
pivot_longer()
.
Learn more in vignette("pivot", "tidyr")
.
Note that pivot_wider()
is not and cannot be lazy because we need to look
at the data to figure out what the new column names will be.
Usage
# S3 method for tbl_lazy
pivot_wider(
data,
...,
id_cols = NULL,
id_expand = FALSE,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = ~max(.x, na.rm = TRUE),
unused_fn = NULL
)
Arguments
- data
A lazy data frame backed by a database query.
- ...
Unused; included for compatibility with generic.
- id_cols
A set of columns that uniquely identifies each observation.
- id_expand
Unused; included for compatibility with the generic.
- names_from, values_from
A pair of arguments describing which column (or columns) to get the name of the output column (
names_from
), and which column (or columns) to get the cell values from (values_from
).If
values_from
contains multiple values, the value will be added to the front of the output column.- names_prefix
String added to the start of every variable name.
- names_sep
If
names_from
orvalues_from
contains multiple variables, this will be used to join their values together into a single string to use as a column name.- names_glue
Instead of
names_sep
andnames_prefix
, you can supply a glue specification that uses thenames_from
columns (and special.value
) to create custom column names.- names_sort
Should the column names be sorted? If
FALSE
, the default, column names are ordered by first appearance.- names_vary
When
names_from
identifies a column (or columns) with multiple unique values, and multiplevalues_from
columns are provided, in what order should the resulting column names be combined?"fastest"
variesnames_from
values fastest, resulting in a column naming scheme of the form:value1_name1, value1_name2, value2_name1, value2_name2
. This is the default."slowest"
variesnames_from
values slowest, resulting in a column naming scheme of the form:value1_name1, value2_name1, value1_name2, value2_name2
.
- names_expand
Should the values in the
names_from
columns be expanded byexpand()
before pivoting? This results in more columns, the output will contain column names corresponding to a complete expansion of all possible values innames_from
. Additionally, the column names will be sorted, identical to whatnames_sort
would produce.- names_repair
What happens if the output has invalid column names?
- values_fill
Optionally, a (scalar) value that specifies what each
value
should be filled in with when missing.- values_fn
A function, the default is
max()
, applied to thevalue
in each cell in the output. In contrast to local data frames it must not beNULL
.- unused_fn
Optionally, a function applied to summarize the values from the unused columns (i.e. columns not identified by
id_cols
,names_from
, orvalues_from
).The default drops all unused columns from the result.
This can be a named list if you want to apply different aggregations to different unused columns.
id_cols
must be supplied forunused_fn
to be useful, since otherwise all unspecified columns will be consideredid_cols
.This is similar to grouping by the
id_cols
then summarizing the unused columns usingunused_fn
.
Details
The big difference to pivot_wider()
for local data frames is that
values_fn
must not be NULL
. By default it is max()
which yields
the same results as for local data frames if the combination of id_cols
and value
column uniquely identify an observation.
Mind that you also do not get a warning if an observation is not uniquely
identified.
The translation to SQL code basically works as follows:
Get unique keys in
names_from
column.For each key value generate an expression of the form:
value_fn(CASE WHEN (`names from column` == `key value`) THEN (`value column`) END AS `output column` )
Group data by id columns.
Summarise the grouped data with the expressions from step 2.
Examples
memdb_frame(
id = 1,
key = c("x", "y"),
value = 1:2
) %>%
tidyr::pivot_wider(
id_cols = id,
names_from = key,
values_from = value
)
#> # Source: SQL [1 x 3]
#> # Database: sqlite 3.40.1 [:memory:]
#> id x y
#> <dbl> <int> <int>
#> 1 1 1 2