This backend supports Databricks Spark SQL, typically accessed via the
Databricks ODBC or JDBC connector. Use dialect_spark_sql() with
lazy_frame() to see simulated SQL without connecting to a live database.
Key differences for this backend are better translation of statistical
aggregate functions (e.g. var(), median()) and use of temporary views
instead of temporary tables when copying data.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_spark_sql())
lf |> summarise(x = median(d, na.rm = TRUE))
#> <SQL>
#> SELECT MEDIAN("d") AS "x"
#> FROM "df"
lf |> summarise(x = var(c, na.rm = TRUE), .by = d)
#> <SQL>
#> SELECT "d", VARIANCE("c") AS "x"
#> FROM "df"
#> GROUP BY "d"
lf |> mutate(x = first(c))
#> <SQL>
#> SELECT *, FIRST_VALUE("c") OVER () AS "x"
#> FROM "df"
lf |> mutate(x = first(c), .by = d)
#> <SQL>
#> SELECT *, FIRST_VALUE("c") OVER (PARTITION BY "d") AS "x"
#> FROM "df"
