Fix bug when partially evaluating unquoting quosure containing a single symbol (#317)
Fixes for rlang and dpylr compatibility.
Minor improvements to SQL generation
Error: `con` must not be NULL: If you see this error, it probably means that you have forgotten to pass
con down to a dbplyr function. Previously, dbplyr defaulted to using
simulate_dbi() which introduced subtle escaping bugs. (It’s also possible I have forgotten to pass it somewhere that the dbplyr tests don’t pick up, so if you can’t figure it out, please let me know).
[) functions are no longer evaluated locally. This makes the translation more consistent and enables useful new idioms for modern databases (#200).
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/) and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain support for window functions, available in Maria DB 10.2, MySQL 8.0, and SQLite 3.25 (#191).
Overall, dplyr generates many fewer subqueries:
New translations for some lubridate functions:
yday() (@colearendt, @derekmorr). Also added new translation for
New translations for stringr functions:
str_to_title() (@colearendt). Non-translated stringr functions throw a clear error.
New translations for bitwise operations:
bitwShiftR(). Unlike the base R functions, the translations do not coerce arguments to integers (@davidchall, #235).
New translation for
CASE WHEN y THEN x END. This enables
sum(a[b == 0]) to work as you expect from R (#202).
y needs to be a logical expression; if not you will likely get a type error from your database.
New translations for
x.y, enabling you to index into nested fields in databases that provide them (#158).
.env pronouns of tidy evaluation are correctly translated (#132).
New translation for
quantile(). Works for all ANSI compliant databases (SQL Server, Postgres, MariaDB, Teradata) and has custom translations for Hive. Thanks to @edavidaja for researching the SQL variants! (#169)
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
SQL simulation has been overhauled. It now works reliably, is better documented, and always uses ANSI escaping (i.e.
` for field names and
' for strings).
tbl_lazy() now actually puts a
dbplyr::src in the
$src field. This shouldn’t affect any downstream code unless you were previously working around this weird difference between
tbl_sql classes. It also includes the
src class in its class, and when printed, shows the generated SQL (#111).
explain()translation now generates
EXPLAIN QUERY PLANwhich generates a higher-level, more human friendly explanation.
Aggregation functions only warn once per session about the use of
na.rm = TRUE (#216).
build_sql() no longer accept
con = NULL as a shortcut for
con = simulate_dbi(). This made it too easy to forget to pass
con along, introducing extremely subtle escaping bugs.
win_over() gains a
con argument for the same reason.
escape_ansi() always uses ANSI SQL 92 standard escaping (for use in examples and documentation).
partial_eval() processes inlined functions (including rlang lambda functions). This makes dbplyr work with more forms of scoped verbs like
df %>% summarise_all(~ mean(.)),
df %>% summarise_all(list(mean)) (#134).
sql_infix() gains a
pad argument for the rare operator that doesn’t need to be surrounded by spaces.
Calls of the form
dplyr::foo() are now evaluated in the database, rather than locally (#197).
tbl objects now include a class generated from the class of the underlying connection object. This makes it possible for dplyr backends to implement different behaviour at the dplyr level, when needed. (#2293)
as.integer64(x) translation to
CAST(x AS BIGINT) (#3305)
str_detect() now uses correct parameter order (#3397)
pull.tbl_sql() now extracts correctly from grouped tables (#3562).
New translations for
dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest.
copy_to() can now “copy” tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can also
copy_to tbl_sqls from another source, and
copy_to() will automatically collect then copy.
Initial support for stringr functions:
str_trim(). Regular expression support varies from database to database, but most simple regular expressions should be ok.
win_aggregate() for generating SQL and windowed SQL functions for aggregates. These take one argument,
x, and warn if
na.rm is not
win_recycled() is equivalent to
win_aggregate() and has been soft-deprecated.
db_write_table now needs to return the table name
SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220).
paste0() support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support for
str_flatten() which behaves like
paste(x, collapse = "-") (but for technical reasons can’t be implemented as a straightforward translation of
same_src.tbl_sql() now performs correct comparison instead of always returning
TRUE. This means that
copy = TRUE once again allows you to perform cross-database joins (#3002).
test_src() makes it easier to access a single test source.
n() translation in windowed context.
na_if translation (@cwarden)
x %in% c(1) now generates the same SQL as
x %in% 1 (#2898).
Added SQL translations for Oracle (@edgararuiz).
x %in% c(1) now generates the same SQL as
x %in% 1 (#2898).
Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823).
db_analyze_table() for MS SQL, Oracle, Hive and Impala (@edgararuiz)
You can now use the magrittr pipe within expressions, e.g.
mutate(mtcars, cyl %>% as.character()).
If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to
NULL with a warning. Now
sql_variant() checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887)
dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation.
query()is no longer exported. It hasn’t been useful for a while so this shouldn’t break any code.
SQL joins have been improved:
SQL joins always use the
ON ... syntax, avoiding
USING ... even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you’d write by hand, eliminating a layer or two of subqueries (#2333)
[API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from
x, and never from
sql_join() generic now gains a
vars argument which lists the variables taken from the left and right sides of the join. If you have a custom
sql_join() method, you’ll need to update how your code generates joins, following the template in
full_join() now return results consistent with local data frame sources when there are records in the right table with no match in the left table.
right_join() returns values of
by columns from the right table.
full_join() returns coalesced values of
by columns from the left and right tables (#2578, @ianmcook)
By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching variables with NULL (#2556).
lazy_tbl() have been exported. These help you test generated SQL with out an active database connection.
as.sql() safely coerces an input to SQL.
ident_q() makes it possible to specifier identifiers that do not need to be quoted.
Translation of inline scalars:
Logical values are now translated differently depending on the backend. The default is to use “true” and “false” which is the SQL-99 standard, but not widely support. SQLite translates to “0” and “1” (#2052).
-Inf are correctly escaped
Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004).
Quoting defaults to
::: are handled correctly (#2321)
x %in% 1 is now correctly translated to
x IN (1) (#511).
nth(x, i) is now correctly translated to
win_over() escapes expression using current database rules.
db_collect() allow backends to override the entire database process behind
db_sql_render() allow additional control over the SQL rendering process.
All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points.
src_desc() has been replaced by
db_desc() and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src.
win_current_order() are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126).
SQL translation for Microsoft SQL Server (@edgararuiz)
SQL translation for Apache Hive (@edgararuiz)
SQL translation for Apache Impala (@edgararuiz)
collect() will automatically LIMIT the result to the
n, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083).
copy_to() no longer checks that the table doesn’t exist before creation, instead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470)
copy_to() now returns it’s output invisibly (since you’re often just calling for the side-effect).
print.tbl_sql() displays ordering (#2287) and prints table name, if known.
Formatting now works by overriding the
tbl_sum() generic instead of
print(). This means that the output is more consistent with tibble, and that
format() is now supported also for SQL sources (tidyverse/dbplyr#14).
[API] The signature of
op_base has changed to
op_base(x, vars, class)
Because it no longer needs the environment
translate_sql()_ now works with a list of dots, rather than a
partial_eval() now takes a character vector of variable names rather than a tbl.
This leads to a simplification of the
op data structure: dots is now a list of expressions rather than a
op_vars() now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known).