@mgirlich is now a dbplyr author in recognition of his significant and sustained contributions.
RPostgreSQL backend warns if
temporary = TRUE since temporary tables are not supported by
Documentation has been radically improved with new topics for each major verb and each backend giving more details about the SQL translation.
Join functions gains a
na_matches argument that allows you to control whether or not
NULL) values match other
NA values. The default is
"never", which is the usual behaviour in databases. You can set
na_matches = "na" to match R’s usual join behaviour (#180). Additional arguments error (instead of being silently swallowed) (#382).
Joins now only use aliases where needed to disambiguate columns; this should make generated queries more readable.
Subqueries no longer include an
ORDER BY clause. This is not part of the SQL spec, and has very limited support across databases. Now such queries generate a warning suggesting that you move your
arrange() call later in the pipeline (#276). (There’s one exception:
ORDER BY is still generated if
LIMIT is present; this tends to affect the returns rows but not necessarily their order).
Subquery names are now scoped within the query. This makes query text deterministic which helps some query optimisers/cachers (#336).
sql_optimise() now can partially optimise a pipeline; due to an unfortunate bug it previously gave up too easily.
Experimental new SAP HANA backend (#233). Requires the latest version of odbc.
sub_str() translation is more consistent in edge cases (@ianmcook).
MySQL uses standard SQL for index creation.
MS SQL translation does better a distinguishing between bit and boolean (#377, #318).
ifelse once again generate
IIF, creating simpler expressions.
as.*() function uses
TRY_CAST() instead of
CAST() for version 11+ (2012+) (@DavidPatShuiFong, #380).
odbc no longer translates
count(); this was an accidental inclusion.
PostgreSQL: new translations for lubridate period functions
floor_date() (@bkkkk, #333) and stringr functions
New RedShift translations when used with
If you are the author of a dbplyr backend, please see
vignette("backend-2") for details.
dbplyr_edition() generic allows you to opt-in to the 2nd edition of the dbplyr API.
db_write_table() now calls
DBI::dbWriteTable() instead of nine generics that formerly each did a small part:
db_data_types(). You can now delete the methods for these generics.
db_query_rows() is no longer used; it appears that it hasn’t been used for some time, so if you have a method, you can delete it.
A number of
db_* generics have been replaced with new SQL generation generics:
This makes them easier to test and is an important part of the process of moving all database generics in dbplyr (#284).
A number of other generics have been renamed to facilitate the move from dplyr to dbplyr:
db_temporary_table() generic makes it easier to work with databases that require temporary tables to be specially named.
sql_expr_matches() generic allows databases to use more efficient alternatives when determine if two values “match” (i.e. like equality but a pair of
NULLs will also match). For more details, see https://modern-sql.com/feature/is-distinct-from
All old lazy eval shims have been removed. These have been deprecated for some time.
Date-time escaping methods for Athena and Presto have moved to the packages where they belong.
Attempting to embed a Shiny reactive in a query now gives a helpful error (#439).
window_order() overrides ordering, rather than appending to it.
escape() now supports
blob vectors using new
sql_escape_raw() generic. It enables using blob variables in dplyr verbs, for example to filter nvarchar values by UTF-16 blobs (see https://github.com/r-dbi/DBI/issues/215#issuecomment-356376133). (@okhoma, #433)
str_detect() translation uses same argument names as stringr, and gains a
negate argument (#444).
dbplyr now uses RPostgres (instead of RPostgreSQL) and RMariaDB (instead of RMySQL) for its internal tests and data functions (#427).
The Date and POSIXt methods for
escape() now use exported
sql_escape_datetime() generics to allow backend specific formatting of date and datetime literals. These are used to provide methods for Athena and Presto backends (@OssiLehtinen, #384, #391).
SQL server: temporary datasets are now session-local, not global (#401).
Fix bug when partially evaluating unquoting quosure containing a single symbol (#317)
Fixes for rlang and dpylr compatibility.
Minor improvements to SQL generation
Error: `con` must not be NULL: If you see this error, it probably means that you have forgotten to pass
con down to a dbplyr function. Previously, dbplyr defaulted to using
simulate_dbi() which introduced subtle escaping bugs. (It’s also possible I have forgotten to pass it somewhere that the dbplyr tests don’t pick up, so if you can’t figure it out, please let me know).
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/) and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain support for window functions, available in Maria DB 10.2, MySQL 8.0, and SQLite 3.25 (#191).
Overall, dplyr generates many fewer subqueries:
New translations for some lubridate functions:
yday() (@colearendt, @derekmorr). Also added new translation for
New translations for stringr functions:
str_to_title() (@colearendt). Non-translated stringr functions throw a clear error.
New translations for bitwise operations:
bitwShiftR(). Unlike the base R functions, the translations do not coerce arguments to integers (@davidchall, #235).
New translation for
CASE WHEN y THEN x END. This enables
sum(a[b == 0]) to work as you expect from R (#202).
y needs to be a logical expression; if not you will likely get a type error from your database.
New translations for
x.y, enabling you to index into nested fields in databases that provide them (#158).
.env pronouns of tidy evaluation are correctly translated (#132).
New translation for
quantile(). Works for all ANSI compliant databases (SQL Server, Postgres, MariaDB, Teradata) and has custom translations for Hive. Thanks to @edavidaja for researching the SQL variants! (#169)
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
SQL simulation has been overhauled. It now works reliably, is better documented, and always uses ANSI escaping (i.e.
` for field names and
' for strings).
tbl_lazy() now actually puts a
dbplyr::src in the
$src field. This shouldn’t affect any downstream code unless you were previously working around this weird difference between
tbl_sql classes. It also includes the
src class in its class, and when printed, shows the generated SQL (#111).
explain()translation now generates
EXPLAIN QUERY PLANwhich generates a higher-level, more human friendly explanation.
Aggregation functions only warn once per session about the use of
na.rm = TRUE (#216).
build_sql() no longer accept
con = NULL as a shortcut for
con = simulate_dbi(). This made it too easy to forget to pass
con along, introducing extremely subtle escaping bugs.
win_over() gains a
con argument for the same reason.
escape_ansi() always uses ANSI SQL 92 standard escaping (for use in examples and documentation).
partial_eval() processes inlined functions (including rlang lambda functions). This makes dbplyr work with more forms of scoped verbs like
df %>% summarise_all(~ mean(.)),
df %>% summarise_all(list(mean)) (#134).
sql_infix() gains a
pad argument for the rare operator that doesn’t need to be surrounded by spaces.
Calls of the form
dplyr::foo() are now evaluated in the database, rather than locally (#197).
tbl objects now include a class generated from the class of the underlying connection object. This makes it possible for dplyr backends to implement different behaviour at the dplyr level, when needed. (#2293)
as.integer64(x) translation to
CAST(x AS BIGINT) (#3305)
str_detect() now uses correct parameter order (#3397)
New translations for
dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest.
copy_to() can now “copy” tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can also
copy_to tbl_sqls from another source, and
copy_to() will automatically collect then copy.
Initial support for stringr functions:
str_trim(). Regular expression support varies from database to database, but most simple regular expressions should be ok.
win_aggregate() for generating SQL and windowed SQL functions for aggregates. These take one argument,
x, and warn if
na.rm is not
win_recycled() is equivalent to
win_aggregate() and has been soft-deprecated.
db_write_table now needs to return the table name
SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220).
paste0() support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support for
str_flatten() which behaves like
paste(x, collapse = "-") (but for technical reasons can’t be implemented as a straightforward translation of
same_src.tbl_sql() now performs correct comparison instead of always returning
TRUE. This means that
copy = TRUE once again allows you to perform cross-database joins (#3002).
test_src() makes it easier to access a single test source.
n() translation in windowed context.
na_if translation (@cwarden)
x %in% c(1) now generates the same SQL as
x %in% 1 (#2898).
Added SQL translations for Oracle (@edgararuiz).
x %in% c(1) now generates the same SQL as
x %in% 1 (#2898).
Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823).
db_analyze_table() for MS SQL, Oracle, Hive and Impala (@edgararuiz)
You can now use the magrittr pipe within expressions, e.g.
mutate(mtcars, cyl %>% as.character()).
If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to
NULL with a warning. Now
sql_variant() checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887)
dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation.
query()is no longer exported. It hasn’t been useful for a while so this shouldn’t break any code.
SQL joins have been improved:
SQL joins always use the
ON ... syntax, avoiding
USING ... even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you’d write by hand, eliminating a layer or two of subqueries (#2333)
[API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from
x, and never from
sql_join() generic now gains a
vars argument which lists the variables taken from the left and right sides of the join. If you have a custom
sql_join() method, you’ll need to update how your code generates joins, following the template in
full_join() now return results consistent with local data frame sources when there are records in the right table with no match in the left table.
right_join() returns values of
by columns from the right table.
full_join() returns coalesced values of
by columns from the left and right tables (#2578, @ianmcook)
By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching variables with NULL (#2556).
lazy_tbl() have been exported. These help you test generated SQL with out an active database connection.
as.sql() safely coerces an input to SQL.
ident_q() makes it possible to specifier identifiers that do not need to be quoted.
Translation of inline scalars:
Logical values are now translated differently depending on the backend. The default is to use “true” and “false” which is the SQL-99 standard, but not widely support. SQLite translates to “0” and “1” (#2052).
-Inf are correctly escaped
Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004).
Quoting defaults to
x %in% 1 is now correctly translated to
x IN (1) (#511).
nth(x, i) is now correctly translated to
win_over() escapes expression using current database rules.
db_collect() allow backends to override the entire database process behind
db_sql_render() allow additional control over the SQL rendering process.
All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points.
src_desc() has been replaced by
db_desc() and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src.
win_current_order() are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126).
SQL translation for Microsoft SQL Server (@edgararuiz)
SQL translation for Apache Hive (@edgararuiz)
SQL translation for Apache Impala (@edgararuiz)
collect() will automatically LIMIT the result to the
n, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083).
copy_to() no longer checks that the table doesn’t exist before creation, instead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470)
copy_to() now returns it’s output invisibly (since you’re often just calling for the side-effect).
print.tbl_sql() displays ordering (#2287) and prints table name, if known.
Formatting now works by overriding the
tbl_sum() generic instead of
print(). This means that the output is more consistent with tibble, and that
format() is now supported also for SQL sources (tidyverse/dbplyr#14).
[API] The signature of
op_base has changed to
op_base(x, vars, class)
Because it no longer needs the environment
translate_sql()_ now works with a list of dots, rather than a
partial_eval() now takes a character vector of variable names rather than a tbl.
This leads to a simplification of the
op data structure: dots is now a list of expressions rather than a
op_vars() now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known).