CRAN release: 2023-02-24
Added an informative error for unsupported join arguments
The rank functions (
cume_dist()) now support multiple variables by wrapping them in
rank(tibble(x, y))(@mgirlich, #1118).
by = character()to perform a cross join is now soft-deprecated in favor of
across()now uses the original value when a column is overriden to match the behaviour of dplyr. For example
mutate(df, across(c(x, y), ~ .x / x))now produces
SELECT `x` / `x` AS `x`, `y` / `x` AS `y` FROM `df`
SELECT `x`, `y` / `x` AS `y` FROM ( SELECT `x` / `x` AS `x`, `y` FROM `df` )
CRAN release: 2023-01-16
In preparation for dplyr 1.1.0:
.byargument is supported (@mgirlich, #1051).
across()is deprecated because the evaluation timing of
...is ambiguous. Now instead of (e.g.)
across(a:b, mean, na.rm = TRUE)use
across(a:b, \(x) mean(x, na.rm = TRUE)
pick()is translated (@mgirlich, #1044).
case_match()is translated (@mgirlich, #1020).
case_when()now supports the
.defaultargument (@mgirlich, #1017).
dbplyr now produces fewer subqueries resulting in shorter, more readable, and, in some cases, faster SQL. The following combination of verbs now avoids a subquery if possible:
Joins only use the table aliases (“LHS” and “RHS”) if necessary (@mgirlich).
Keyword highlighting can now be customised via the option
dbplyr_highlight. Turn it off via
options(dbplyr_highlight = FALSE)or pass a custom ansi style, e.g.
options(dbplyr_highlight = cli::combine_ansi_styles("bold", "cyan"))(@mgirlich, #974).
blob()s are correctly translated to
CRAN release: 2022-06-27
CRAN release: 2022-06-05
SQL formatting has been considerably improved with new wrapping and indenting.
show_query()creates more readable queries by printing the keywords in blue (@mgirlich, #644). When possible dbplyr now uses
SELECT *instead of explicitly selecting every column (@mgirlich).
When possible, dbplyr now uses
SELECT *instead of explicitly selecting every column (@mgirlich).
Improved translations for specific backends:
as.Date()for Oracle (@mgirlich, #661).
case_when()with a final clause of the form
TRUE ~ ...uses
ELSE ...for SQLite (@mgirlich, #754).
isoyear()for Postgres (@mgirlich, #675).
explain()for ROracle (@mgirlich).
fill()for SQL Server (#651, @mgirlich) and RPostgreSQL (@mgirlich).
quantile()for SQL Server (@mgirlich, #620).
str_flatten()for Redshift (@hdplsa, #804)
slice_sample()for MySQL/MariaDB and SQL Server (@mgirlich, #617).
union()for Hive (@mgirlich, #663).
Calls of the form
lubridate::foo()are now evaluated in the database, rather than locally (#197).
The partial evaluation code is now more aligned with
dtplyr. This makes it easier to transfer bug fixes and new features from one package to the other. In this process the second argument of
partial_eval()was changed to a lazy frame instead of a character vector of variables (@mgirlich, #766). Partially evaluated expressions with infix operations are now correctly translated. For example
translate_sql(!!expr(2 - 1) * x)now works (@mgirlich, #634).
pillar::tbl_format_header()method for lazy tables: Printing a lazy table where all rows are displayed also shows the exact number of rows in the header. The threshold is controlled by
getOption("pillar.print_min"), with a default of 10 (#796, @krlmlr).
The 1st edition extension mechanism is formally deprecated (#507).
Joins disambiguates columns that only differ in case (@mgirlich, #702). New arguments
y_asallow you to control the table alias used in SQL query (@mgirlich, #637). Joins with
na_matches = "na"now work for DuckDB (@mgirlich, #704).
na.rm = FALSEonly warns once every 8 hours across all functions (#899).
sql_random()is now exported.
CRAN release: 2021-04-06
New support for Snowflake (@edgararuiz)
CRAN release: 2021-02-03
@mgirlich is now a dbplyr author in recognition of his significant and sustained contributions.
RPostgreSQL backend warns if
temporary = TRUEsince temporary tables are not supported by
CRAN release: 2020-11-03
Documentation has been radically improved with new topics for each major verb and each backend giving more details about the SQL translation.
Join functions gains a
na_matchesargument that allows you to control whether or not
NULL) values match other
NAvalues. The default is
"never", which is the usual behaviour in databases. You can set
na_matches = "na"to match R’s usual join behaviour (#180). Additional arguments error (instead of being silently swallowed) (#382).
Joins now only use aliases where needed to disambiguate columns; this should make generated queries more readable.
Subqueries no longer include an
ORDER BYclause. This is not part of the SQL spec, and has very limited support across databases. Now such queries generate a warning suggesting that you move your
arrange()call later in the pipeline (#276). (There’s one exception:
ORDER BYis still generated if
LIMITis present; this tends to affect the returns rows but not necessarily their order).
Subquery names are now scoped within the query. This makes query text deterministic which helps some query optimisers/cachers (#336).
sql_optimise()now can partially optimise a pipeline; due to an unfortunate bug it previously gave up too easily.
Experimental new SAP HANA backend (#233). Requires the latest version of odbc.
[[can now also translate numeric indices (#520).
%/%now generates a clear error message; previously it was translated to
/which is not correct (#108).
sub_str()translation is more consistent in edge cases (@ianmcook).
MySQL uses standard SQL for index creation.
MS SQL translation does better a distinguishing between bit and boolean (#377, #318).
ifelseonce again generate
IIF, creating simpler expressions.
CAST()for version 11+ (2012+) (@DavidPatShuiFong, #380).
odbc no longer translates
count(); this was an accidental inclusion.
PostgreSQL: new translations for lubridate period functions
floor_date()(@bkkkk, #333) and stringr functions
New RedShift translations when used with
If you are the author of a dbplyr backend, please see
vignette("backend-2") for details.
dbplyr_edition()generic allows you to opt-in to the 2nd edition of the dbplyr API.
DBI::dbWriteTable()instead of nine generics that formerly each did a small part:
db_data_types(). You can now delete the methods for these generics.
db_query_rows()is no longer used; it appears that it hasn’t been used for some time, so if you have a method, you can delete it.
A number of
db_*generics have been replaced with new SQL generation generics:
This makes them easier to test and is an important part of the process of moving all database generics in dbplyr (#284).
A number of other generics have been renamed to facilitate the move from dplyr to dbplyr:
db_temporary_table()generic makes it easier to work with databases that require temporary tables to be specially named.
sql_expr_matches()generic allows databases to use more efficient alternatives when determine if two values “match” (i.e. like equality but a pair of
NULLs will also match). For more details, see https://modern-sql.com/feature/is-distinct-from
All old lazy eval shims have been removed. These have been deprecated for some time.
Date-time escaping methods for Athena and Presto have moved to the packages where they belong.
Attempting to embed a Shiny reactive in a query now gives a helpful error (#439).
window_order()overrides ordering, rather than appending to it.
CRAN release: 2020-05-27
blobvectors using new
sql_escape_raw()generic. It enables using blob variables in dplyr verbs, for example to filter nvarchar values by UTF-16 blobs (see https://github.com/r-dbi/DBI/issues/215#issuecomment-356376133). (@okhoma, #433)
str_detect()translation uses same argument names as stringr, and gains a
CRAN release: 2020-04-19
dbplyr now uses RPostgres (instead of RPostgreSQL) and RMariaDB (instead of RMySQL) for its internal tests and data functions (#427).
The Date and POSIXt methods for
escape()now use exported
sql_escape_datetime()generics to allow backend specific formatting of date and datetime literals. These are used to provide methods for Athena and Presto backends (@OssiLehtinen, #384, #391).
SQL server: temporary datasets are now session-local, not global (#401).
CRAN release: 2019-06-17
Fix bug when partially evaluating unquoting quosure containing a single symbol (#317)
Fixes for rlang and dpylr compatibility.
CRAN release: 2019-06-05
Minor improvements to SQL generation
CRAN release: 2019-04-23
Error: `con` must not be NULL: If you see this error, it probably means that you have forgotten to pass
condown to a dbplyr function. Previously, dbplyr defaulted to using
simulate_dbi()which introduced subtle escaping bugs. (It’s also possible I have forgotten to pass it somewhere that the dbplyr tests don’t pick up, so if you can’t figure it out, please let me know).
[) functions are no longer evaluated locally. This makes the translation more consistent and enables useful new idioms for modern databases (#200).
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/) and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain support for window functions, available in Maria DB 10.2, MySQL 8.0, and SQLite 3.25 (#191).
Overall, dplyr generates many fewer subqueries:
New translations for some lubridate functions:
yday()(@colearendt, @derekmorr). Also added new translation for
New translations for stringr functions:
str_to_title()(@colearendt). Non-translated stringr functions throw a clear error.
New translations for bitwise operations:
bitwShiftR(). Unlike the base R functions, the translations do not coerce arguments to integers (@davidchall, #235).
New translation for
CASE WHEN y THEN x END. This enables
sum(a[b == 0])to work as you expect from R (#202).
yneeds to be a logical expression; if not you will likely get a type error from your database.
New translations for
x.y, enabling you to index into nested fields in databases that provide them (#158).
.envpronouns of tidy evaluation are correctly translated (#132).
New translation for
quantile(). Works for all ANSI compliant databases (SQL Server, Postgres, MariaDB, Teradata) and has custom translations for Hive. Thanks to @edavidaja for researching the SQL variants! (#169)
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
SQL simulation has been overhauled. It now works reliably, is better documented, and always uses ANSI escaping (i.e.
`for field names and
tbl_lazy()now actually puts a
$srcfield. This shouldn’t affect any downstream code unless you were previously working around this weird difference between
tbl_sqlclasses. It also includes the
srcclass in its class, and when printed, shows the generated SQL (#111).
explain()translation now generates
EXPLAIN QUERY PLANwhich generates a higher-level, more human friendly explanation.
Improved translation for
CAST(x as BIT)(#250).
Aggregation functions only warn once per session about the use of
na.rm = TRUE(#216).
build_sql()no longer accept
con = NULLas a shortcut for
con = simulate_dbi(). This made it too easy to forget to pass
conalong, introducing extremely subtle escaping bugs.
conargument for the same reason.
escape_ansi()always uses ANSI SQL 92 standard escaping (for use in examples and documentation).
mutate(df, x = NULL)drops
xfrom the output, just like when working with local data frames (#194).
partial_eval()processes inlined functions (including rlang lambda functions). This makes dbplyr work with more forms of scoped verbs like
df %>% summarise_all(~ mean(.)),
df %>% summarise_all(list(mean))(#134).
padargument for the rare operator that doesn’t need to be surrounded by spaces.
CRAN release: 2019-01-09
- Now supports for dplyr 0.8.0 (#190) and R 3.1.0
Calls of the form
dplyr::foo()are now evaluated in the database, rather than locally (#197).
tblobjects now include a class generated from the class of the underlying connection object. This makes it possible for dplyr backends to implement different behaviour at the dplyr level, when needed. (#2293)
CAST(x AS BIGINT)(#3305)
str_detect()now uses correct parameter order (#3397)
CRAN release: 2018-01-03
New translations for
dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest.
copy_to()can now “copy” tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can also
copy_totbl_sqls from another source, and
copy_to()will automatically collect then copy.
Initial support for stringr functions:
str_trim(). Regular expression support varies from database to database, but most simple regular expressions should be ok.
win_aggregate()for generating SQL and windowed SQL functions for aggregates. These take one argument,
x, and warn if
win_recycled()is equivalent to
win_aggregate()and has been soft-deprecated.
db_write_tablenow needs to return the table name
SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220).
paste0()support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support for
str_flatten()which behaves like
paste(x, collapse = "-")(but for technical reasons can’t be implemented as a straightforward translation of
same_src.tbl_sql()now performs correct comparison instead of always returning
TRUE. This means that
copy = TRUEonce again allows you to perform cross-database joins (#3002).
test_src()makes it easier to access a single test source.
n()translation in windowed context.
CRAN release: 2017-06-27
x %in% c(1)now generates the same SQL as
x %in% 1(#2898).
Added SQL translations for Oracle (@edgararuiz).
x %in% c(1)now generates the same SQL as
x %in% 1(#2898).
head(tbl, 0)is now supported (#2863).
Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823).
db_analyze_table()for MS SQL, Oracle, Hive and Impala (@edgararuiz)
You can now use the magrittr pipe within expressions, e.g.
mutate(mtcars, cyl %>% as.character()).
If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to
NULLwith a warning. Now
sql_variant()checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887)
CRAN release: 2017-06-09
dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation.
in_schema()function makes it easy to refer to tables in schema:
query()is no longer exported. It hasn’t been useful for a while so this shouldn’t break any code.
SQL joins have been improved:
SQL joins always use the
ON ...syntax, avoiding
USING ...even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you’d write by hand, eliminating a layer or two of subqueries (#2333)
[API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from
x, and never from
sql_join()generic now gains a
varsargument which lists the variables taken from the left and right sides of the join. If you have a custom
sql_join()method, you’ll need to update how your code generates joins, following the template in
full_join()now return results consistent with local data frame sources when there are records in the right table with no match in the left table.
right_join()returns values of
bycolumns from the right table.
full_join()returns coalesced values of
bycolumns from the left and right tables (#2578, @ianmcook)
By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching variables with NULL (#2556).
lazy_tbl()have been exported. These help you test generated SQL with out an active database connection.
as.sql()safely coerces an input to SQL.
ident_q()makes it possible to specifier identifiers that do not need to be quoted.
Translation of inline scalars:
Logical values are now translated differently depending on the backend. The default is to use “true” and “false” which is the SQL-99 standard, but not widely support. SQLite translates to “0” and “1” (#2052).
-Infare correctly escaped
Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004).
Quoting defaults to
:::are handled correctly (#2321)
x %in% 1is now correctly translated to
x IN (1)(#511).
ident()now returns an object with class
c("ident", "character"). It no longer contains “sql” to indicate that this is not already escaped.
log(x, b)is now correctly translated to the SQL
log(b, x)(#2288). SQLite does not support the 2-argument log function so it is translated to
log(x) / log(b).
nth(x, i)is now correctly translated to
win_over()escapes expression using current database rules.
db_collect()allow backends to override the entire database process behind
db_sql_render()allow additional control over the SQL rendering process.
All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points.
src_desc()has been replaced by
db_desc()and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src.
win_current_order()are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126).
SQL translation for Microsoft SQL Server (@edgararuiz)
SQL translation for Apache Hive (@edgararuiz)
SQL translation for Apache Impala (@edgararuiz)
collect()will automatically LIMIT the result to the
n, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083).
copy_to()no longer checks that the table doesn’t exist before creation, instead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470)
copy_to()now returns it’s output invisibly (since you’re often just calling for the side-effect).
print.tbl_sql()displays ordering (#2287) and prints table name, if known.
print(df, n = Inf)and
head(df, n = Inf)now work with remote tables (#2580).
Formatting now works by overriding the
tbl_sum()generic instead of
print(). This means that the output is more consistent with tibble, and that
format()is now supported also for SQL sources (tidyverse/dbplyr#14).
[API] The signature of
op_basehas changed to
op_base(x, vars, class)
Because it no longer needs the environment
translate_sql()_now works with a list of dots, rather than a
partial_eval()now takes a character vector of variable names rather than a tbl.
This leads to a simplification of the
opdata structure: dots is now a list of expressions rather than a
op_vars()now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known).