These are methods for the dplyr generics dplyr::slice_min(), dplyr::slice_max(), and
dplyr::slice_sample(). They are translated to SQL using dplyr::filter() and
window functions (ROWNUMBER, MIN_RANK, or CUME_DIST depending on
arguments). slice(), slice_head(), and slice_tail() are not supported
since database tables have no intrinsic order.
If data is grouped, the operation will be performed on each group so that
(e.g.) slice_min(db, x, n = 3) will select the three rows with the smallest
value of x in each group.
Usage
# S3 method for class 'tbl_lazy'
slice_min(
.data,
order_by,
...,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = TRUE
)
# S3 method for class 'tbl_lazy'
slice_max(
.data,
order_by,
...,
n,
by = NULL,
prop,
with_ties = TRUE,
na_rm = TRUE
)
# S3 method for class 'tbl_lazy'
slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)Arguments
- .data
A lazy data frame backed by a database query.
- order_by
Variable or function of variables to order by.
- ...
Not used.
- n, prop
Provide either
n, the number of rows, orprop, the proportion of rows to select. If neither are supplied,n = 1will be used.If
nis greater than the number of rows in the group (orprop> 1), the result will be silently truncated to the group size. If the proportion of a group size is not an integer, it is rounded down.- by
-
<
tidy-select> Optionally, a selection of columns to group by for just this operation, functioning as an alternative togroup_by(). For details and examples, see ?dplyr_by. - with_ties
Should ties be kept together? The default,
TRUE, may return more rows than you request. Use FALSE to ignore ties, and return the first n rows.- na_rm
Should missing values in
order_bybe removed from the result? IfFALSE,NAvalues are sorted to the end (like inarrange()), so they will only be included if there are insufficient non-missing values to reachn/prop.- weight_by, replace
Not supported for database backends.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:3, y = c(1, 1, 2))
db |> slice_min(x) |> show_query()
#> <SQL>
#> SELECT `x`, `y`
#> FROM (
#> SELECT
#> `dbplyr_tmp_JGCySHwmYE`.*,
#> CASE
#> WHEN (NOT((`x` IS NULL))) THEN RANK() OVER (PARTITION BY (CASE WHEN ((`x` IS NULL)) THEN 1 ELSE 0 END) ORDER BY `x`)
#> END AS `col01`
#> FROM `dbplyr_tmp_JGCySHwmYE`
#> ) AS `q01`
#> WHERE (`col01` <= 1)
db |> slice_max(x) |> show_query()
#> <SQL>
#> SELECT `x`, `y`
#> FROM (
#> SELECT
#> `dbplyr_tmp_JGCySHwmYE`.*,
#> CASE
#> WHEN (NOT((`x` IS NULL))) THEN RANK() OVER (PARTITION BY (CASE WHEN ((`x` IS NULL)) THEN 1 ELSE 0 END) ORDER BY `x` DESC)
#> END AS `col01`
#> FROM `dbplyr_tmp_JGCySHwmYE`
#> ) AS `q01`
#> WHERE (`col01` <= 1)
db |> slice_sample() |> show_query()
#> <SQL>
#> SELECT `x`, `y`
#> FROM (
#> SELECT
#> `dbplyr_tmp_JGCySHwmYE`.*,
#> CASE
#> WHEN (NOT((((0.5 + RANDOM() / 18446744073709551616.0)) IS NULL))) THEN ROW_NUMBER() OVER (PARTITION BY (CASE
#> WHEN ((((0.5 + RANDOM() / 18446744073709551616.0)) IS NULL)) THEN 1
#> ELSE 0
#> END) ORDER BY (0.5 + RANDOM() / 18446744073709551616.0))
#> END AS `col01`
#> FROM `dbplyr_tmp_JGCySHwmYE`
#> ) AS `q01`
#> WHERE (`col01` <= 1)
db |> group_by(y) |> slice_min(x) |> show_query()
#> <SQL>
#> SELECT `x`, `y`
#> FROM (
#> SELECT
#> `dbplyr_tmp_JGCySHwmYE`.*,
#> CASE
#> WHEN (NOT((`x` IS NULL))) THEN RANK() OVER (PARTITION BY `y`, (CASE WHEN ((`x` IS NULL)) THEN 1 ELSE 0 END) ORDER BY `x`)
#> END AS `col01`
#> FROM `dbplyr_tmp_JGCySHwmYE`
#> ) AS `q01`
#> WHERE (`col01` <= 1)
# By default, ties are includes so you may get more rows
# than you expect
db |> slice_min(y, n = 1)
#> # A query: ?? x 2
#> # Database: sqlite 3.51.1 [:memory:]
#> x y
#> <int> <dbl>
#> 1 1 1
#> 2 2 1
db |> slice_min(y, n = 1, with_ties = FALSE)
#> # A query: ?? x 2
#> # Database: sqlite 3.51.1 [:memory:]
#> x y
#> <int> <dbl>
#> 1 1 1
# Non-integer group sizes are rounded down
db |> slice_min(x, prop = 0.5)
#> # A query: ?? x 2
#> # Database: sqlite 3.51.1 [:memory:]
#> x y
#> <int> <dbl>
#> 1 1 1
