See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are better translation of statistical aggregate functions
(e.g. var()
, median()
) and use of temporary views instead of temporary
tables when copying data.
Use simulate_spark_sql()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = simulate_spark_sql())
lf %>% summarise(x = median(d, na.rm = TRUE))
#> <SQL>
#> SELECT MEDIAN(`d`) AS `x`
#> FROM `df`
lf %>% summarise(x = var(c, na.rm = TRUE), .by = d)
#> <SQL>
#> SELECT `d`, VARIANCE(`c`) AS `x`
#> FROM `df`
#> GROUP BY `d`
lf %>% mutate(x = first(c))
#> <SQL>
#> SELECT `df`.*, FIRST_VALUE(`c`) OVER () AS `x`
#> FROM `df`
lf %>% mutate(x = first(c), .by = d)
#> <SQL>
#> SELECT `df`.*, FIRST_VALUE(`c`) OVER (PARTITION BY `d`) AS `x`
#> FROM `df`