Changelog
Source:NEWS.md
dbplyr (development version)
Tightened argument checks for SQL translations. These changes should result in more informative errors in cases where code already failed, possibly silently; if you see errors with code that used to run correctly, please report them to the package authors (@simonpcouch, #1554, #1555).
clock::add_years()
translates to correct SQL on Spark (@ablack3, #1510).Translations for
as.double()
andas.character()
with Teradata previously raised errors and are now correct (@rplsmn, #1545).Translations of
difftime()
for Postgres, SQL server, Redshift, and Snowflake previously returned the wrong sign and are now correct (@edward-burn, #1532).across(everything())
doesn’t select grouping columns created via.by
insummarise()
(@mgirlich, #1493).New translations of clock function
date_count_between()
for SQL server, Redshift, Snowflake, Postgres, and Spark (@edward-burn, #1495).Spark SQL backend now supports persisting tables with
compute(x, name = I("x.y.z"), temporary = FALSE)
(@zacdav-db, #1502).
dbplyr 2.5.0
CRAN release: 2024-03-19
Improved tools for qualified table names
-
Specification of table names with schema/catalogs has been overhauled to make it simpler. This includes the following features and fixes:
The simplest way to refer to a qualified table is now to wrap it in
I()
, e.g.I("schema_name.table_name")
.Use of
sql()
andident_q()
insidein_catalog()
andin_schema()
is once again supported (#1388).It’s ok to use
ident_q()
once again (#1413) and you should no longer see unsuppressable warnings about usingin_schema()
(#1408).The names of the arguments to
Id()
no longer matter, only their order (#1416). Additionally, thanks to changes to the DBI package, you no longer need to name each argument.If you accidentally pass a named vector to any of the database identifer functions, those names will be automatically stripped (#1404).
tbl_sql(check_from)
is now deprecated.
dbplyr now exports some tools to work with the internal
table_path
class which is useful for certain backends that need to work with this data structure (#1300).
Improved SQL
New translations for clock functions
add_years()
,add_days()
,date_build()
,get_year()
,get_month()
,get_day()
, andbase::difftime()
on SQL server, Redshift, Snowflake, and Postgres.select()
will keep computed columns used toarrange()
subqueries that are eliminated by a subsequent select (@ejneer, #1437).semi_join()
will no longer inline away an aggregate filter (i.e.HAVING
clause) that was followed by aselect()
(@ejneer, #1474)-
Improved function translations:
Functions qualified with the base namespace are now also translated, e.g.
base::paste0(x, "_1")
is now translated (@mgirlich, #1022).-1 + x
now generates a translation instead erroring (#1420).x$name
never attempts to evaluatename
(#1368).You can once again use
NULL
on the LHS of an infix operator in order to generate SQL with unusual syntax (#1345).Namespaced calls now error if the function doesn’t exist, or a translation is not available (#1426).
lead()
translation coercesn
to an integer.
Databricks: now supports creating non-temporary tables too (#1418).
-
Oracle:
db_explain()
now works (@thomashulst, #1353).head()
is once again translated toFETCH FIRST
. This does require Oracle 12c or newer, but it actually works, compared to the approach usingROWNUM
from #1292 (#1436).Added support for
str_replace()
andstr_replace_all()
viaREGEXP_REPLACE()
(@thomashulst, #1402).
-
Snowflake (@nathanhaigh, #1406)
Added support for
str_starts()
andstr_ends()
viaREGEXP_INSTR()
Refactored
str_detect()
to useREGEXP_INSTR()
so now supports regular expressions.Refactored
grepl()
to useREGEXP_INSTR()
so now supports case-insensitive matching throughgrepl(..., ignore.case = TRUE)
-
SQL server:
MySQL:
as.integer()
gets correct translation (@krlmlr, #1375).
Minor improvements and bug fixes
Deprecation status of functions deprecated in previous versions (at least 2 years old) have been advanced. In particular,
src_sql()
is now defunct, as is the use ofpartial_eval()
with characterdata
.Database errors now show the generated SQL, which hopefully will make it faster to track down problems (#1401).
When dbplyr creates an index on a table in a schema (e.g.
schema.table
), it now only includes the table name in the index name, not the schema name.The class of remote sources now includes all S4 class names, not just the first (#918).
compute()
passes additional arguments all the way tosql_query_save()
-methods (@rsund).db_sql_render()
correctly passes on...
when re-calling withsql_options
set (#1394).reframe()
now gives an informative error that it isn’t supported (#1148).rows_patch(in_place = FALSE)
now works when more than one column should be patched (@gorcha, #1443).New
simulate_mariadb()
(@krlmlr, #1375).sql_translator()
now checks for duplicated definitions (@krlmlr, #1374).
dbplyr 2.4.0
CRAN release: 2023-10-26
Breaking changes
Using
compute(temporary = FALSE)
without providing a name is now deprecated (@mgirlich, #1154).ntile()
’s first argument has been renamed fromorder_by
tox
to match the interface ofdplyr::ntile()
(@mgirlich, #1242).simulate_vars()
andsimulate_vars_is_typed()
were removed as they weren’t used and tidyselect now offerstidyselect_data_proxy()
andtidyselect_data_has_predicates()
(@mgirllich, #1199).sql_not_supported()
now expects a function name without parentheses.-
sql_query_append()
,sql_query_insert()
,sql_query_update()
,sql_query_upsert()
, andsql_query_delete()
changed their arguments to make them more consistent to the othersql_query_*()
functions:-
x_name
was renamed totable
. -
y
was renamed tofrom
and must now be a table identifier or SQL instead of a lazy table. -
sql_query_append()
andsql_query_insert()
have gained the argumentcols
.
-
remote_name()
now returns a string with the name of the table. To get the qualified identifier use the newly addedremote_table()
(@mgirlich, #1280).tbl_lazy()
losessrc
argument after it has been deprecated for years (@mgirlich, #1208).translate_sql()
now requires thecon
argument (@mgirlich, #1311). Thevars
argument has been removed after it threw an error for the last 7 years (@mgirlich).
Improved SQL
Preliminary databricks Spark SQL backend (#1377).
-
Joins
*_join()
afterfull_join()
works again (@mgirlich, #1178).*_join()
now allows specifying the relationship argument. It must beNULL
or"many-to-many"
(@bairdj, #1305).Queries now qualify
*
with the table alias for better compatibility (@mgirlich, #1003).full_join()
can now handle column names that only differ in case (@ejneer, #1255).The
na_matches
argument ofsemi_join()
andanti_join()
works again (@mgirlich, #1211).A
semi/anti_join()
on fitleredy
is inlined when possible (@mgirlich, #884).Joins now work again for Pool and Oracle connections (@mgirlich, #1177, #1181).
A sequence of
union()
resp.union_all()
now produces a flat query instead of subqueries (@mgirlich, #1269).-
Added translations for:
if_any()
andif_all()
translations are now wrapped in parentheses. This makes sure it can be combined via&
with other conditions (@mgirlich, #1153).nth()
,first()
, andlast()
now support thena_rm
argument (@mgirlich, #1193).
Minor improvements and bug fixes
across()
now supports namespaced functions, e.g.across(x, dplyr::dense_rank)
(@mgirlich, #1231).db_copy_to(overwrite = TRUE)
now actually works.db_copy_to()
’s...
are now passed todb_write_table()
(@mgirlich, #1237).Added
db_supports_table_alias_with_as()
to customise whether a backend supports specifying a table alias withAS
or not (@mgirlich).db_write_table()
anddb_save_query()
gain theoverwrite
argument.dbplyr_pivot_wider_spec()
is now exported. Unlikepivot_wider()
this can be lazy. Note that this will be removed soon afterpivot_wider_spec()
becomes a generic (@mgirlich).filter()
ing with window functions now generates columns calledcol01
rather thanq01
(@mgirlich, #1258).pivot_wider()
now matches tidyrNA
column handling (@ejneer #1238).select()
can once again be used afterarrange(desc(x))
(@ejneer, #1240).-
show_query()
andremote_query()
gain the argumentsql_options
that allows to control how the SQL is generated. It can be created viasql_options()
which has the following arguments:-
cte
: use common table expressions? -
use_star
: useSELECT *
or explicitly select every column? -
qualify_all_columns
: qualify all columns in a join or only the ambiguous ones? (@mgirlich, #1146).
Consequently the
cte
argument ofshow_query()
andremote_query()
has been deprecated (@mgirlich, #1146). -
slice_min/max()
can now order by multiple variables like dplyr, e.g. useslice_min(lf, tibble(x, y))
(@mgirlich, #1167).slice_*()
now supports the data masking pronouns.env
and.data
(@mgirlich, #1294).sql_join_suffix()
gains the argumentsuffix
so that methods can check whether the suffix is valid for the backend (@mgirlich).sql_random()
is now deprecated. It was used to powerslice_sample()
which is now done via the translation forrunif()
(@mgirlich, #1200).tbl()
now informs when the user probably forgot to wrap the table identifier within_schema()
orsql()
(@mgirlich, #1287).
Backend specific improvements
- Access
- DuckDB
- now supports the
returning
argument ofrows_*()
.
- now supports the
- MySQL/MariaDB:
-
rows_update()
androws_patch()
now give an informative error when the unsupportedreturning
argument is used (@mgirlich, #1279). -
rows_upsert()
now gives an informative error that it isn’t supported (@mgirlich, #1279). -
rows_*()
use the column types ofx
when auto copyingy
(@mgirlich, #1327). -
copy_inline()
now works (@mgirlich, #1188). - Fix translation of
as.numeric()
,as.POSIXct()
,as_datetime()
, andas.integer64()
(@avsdev-cw, #1189).
-
- MS SQL:
-
row_number()
now works when no order is specified (@ejneer, @fh-mthomson, #1332)
-
- Oracle:
- Fix translation of
rows_upsert()
(@mgirlich, @TBlackmore, #1286) -
head(n)
is now translated toWHERE ROWNUM <= n
to also support old versions <= 11.2 (@JeremyPasco, #1292).
- Fix translation of
- Postgres
- SQLite
- Subqueries now also get an alias. This makes it consistent with other backends and simplifies the implementation.
- SQL Server
- Snowflake:
-
na.rm = TRUE
is now respected inpmin()
andpmax()
instead of being silently ignored (@fh-mthomson, #1329) -
row_number()
now works when no order is specified (@fh-mthomson, #1332)
-
- Teradata
-
distinct()
+head()
now work (@mgirlich, #685). -
as.Date(x)
is now translate toCAST(x AS DATE)
again unlessx
is a string (@mgirlich, #1285). -
row_number()
no longer defaults to partitioning by groups (now aligned with other databases when no order is specified:ROW_NUMBER()
defaults toORDER BY (SELECT NULL)
) (@fh-mthomson, #1331)
-
dbplyr 2.3.1
CRAN release: 2023-02-24
Breaking changes
-
window_order()
now only accepts bare symbols or symbols wrapped indesc()
. This breaking change is necessary to allowselect()
to drop and rename variables used inwindow_order()
(@mgirlich, #1103).
Improved error messages
quantile()
andmedian()
now error for SQL Server when used insummarise()
and for PostgreSQL when used inmutate()
as they can’t be properly translated (@mgirlich, #1110).Added an informative error for unsupported join arguments
unmatched
andmultiple
(@mgirlich).Using predicates, e.g.
where(is.integer)
, inacross()
now produces an error as they never worked anyway (@mgirlich, #1169).Catch unsupported argument
pivot_wider(id_expand = TRUE)
andpivot_longer(cols_vary)
(@mgirlich, #1109).
Bug fixes in SQL generation
Fixed an issue when using a window function after a
summarise()
andselect()
(@mgirlich, #1104).Fixed an issue when there where at least 3 joins and renamed variables (@mgirlich, #1101).
mutate()
andselect()
afterdistinct()
now again produce a subquery to generate the correct translation (@mgirlich, #1119, #1141).Fixed an issue when using
filter()
on a summarised variable (@mgirlich, #1128).mutate()
+filter()
now again produces a new query if themutate()
uses a window function or SQL (@mgirlich, #1135).across()
andpick()
can be used (again) indistinct()
(@mgirlich, #1125).The
rows_*()
function work again for tables in a schema in PostgreSQL (@mgirlich, #1133).
Minor improvements and bug fixes
sql()
now evaluates its arguments locally also when used inacross()
(@mgirlich, #1039).The rank functions (
row_number()
,min_rank()
,rank()
,dense_rank()
,percent_rank()
, andcume_dist()
) now support multiple variables by wrapping them intibble()
, e.g.rank(tibble(x, y))
(@mgirlich, #1118).Added support for
join_by()
added in dplyr 1.1.0 (@mgirlich, #1074).Using
by = character()
to perform a cross join is now soft-deprecated in favor ofcross_join()
.full_join()
andright_join()
are now translated directly toFULL JOIN
andRIGHT JOIN
for SQLite as native support was finally added (@mgirlich, #1150).case_match()
now works with strings on the left hand side (@mgirlich, #1143).The rank functions (
row_number()
,min_rank()
,rank()
,dense_rank()
,percent_rank()
, andcume_dist()
) now work again for variables wrapped indesc()
, e.g.row_number(desc(x))
(@mgirlich, #1118).Moved argument
auto_index
after...
in*_join()
(@mgirlich, #1115).-
across()
now uses the original value when a column is overridden to match the behaviour of dplyr. For examplemutate(df, across(c(x, y), ~ .x / x))
now producesSELECT `x` / `x` AS `x`, `y` / `x` AS `y` FROM `df`
instead of
SELECT `x`, `y` / `x` AS `y` FROM ( SELECT `x` / `x` AS `x`, `y` FROM `df` )
Restricted length of table aliases to avoid truncation on certain backends (e.g., Postgres) (@fh-mthomson, #1096)
dbplyr 2.3.0
CRAN release: 2023-01-16
New features
stringr::str_like()
(new in 1.5.0) is translated to the closestLIKE
equivalent (@rjpat, #509)-
In preparation for dplyr 1.1.0:
- The
.by
argument is supported (@mgirlich, #1051). - Passing
...
toacross()
is deprecated because the evaluation timing of...
is ambiguous. Now instead of (e.g.)across(a:b, mean, na.rm = TRUE)
useacross(a:b, \(x) mean(x, na.rm = TRUE)
-
pick()
is translated (@mgirlich, #1044). -
case_match()
is translated (@mgirlich, #1020). -
case_when()
now supports the.default
argument (@mgirlich, #1017).
- The
Variables that aren’t found in either the data or in the environment now produce an error (@mgirlich, #907).
SQL optimisation
-
dbplyr now produces fewer subqueries resulting in shorter, more readable, and, in some cases, faster SQL. The following combination of verbs now avoids a subquery if possible:
dbplyr now uses
SELECT *
after a join instead of explicitly selecting every column, where possible (@mgirlich, #898).Joins only use the table aliases (“LHS” and “RHS”) if necessary (@mgirlich).
When using common table expressions, the results of joins and set operations are now reused (@mgirlich, #978).
Improved error messages
Many errors have been improved and now show the function where the error happened instead of a helper function (@mgirlich, #907).
Errors produced by the database, e.g. in
collect()
orrows_*()
, now show the verb where the error happened (@mgirlich).window_order()
now produces a better error message when applied to a data frame (@mgirlich, #947).Using a named
across()
now gives a clear error message (@mgirlich, #761).
Minor improvements and bug fixes
Keyword highlighting can now be customised via the option
dbplyr_highlight
. Turn it off viaoptions(dbplyr_highlight = FALSE)
or pass a custom ansi style, e.g.options(dbplyr_highlight = cli::combine_ansi_styles("bold", "cyan"))
(@mgirlich, #974).The rank functions (
row_number()
,min_rank()
,rank()
,dense_rank()
,percent_rank()
, andcume_dist()
) now give missing values the rank NA to match the behaviour of dplyr (@mgirlich, #991).NA
s inblob()
s are correctly translated toNULL
(#983).copy_inline()
gains atypes
argument to specify the SQL column types (@mgirlich, #963).cur_column()
is now supported (@mgirlich, #951).distinct()
returns columns ordered the way you request, not the same as the input data (@mgirlich).fill()
can now fill “downup” and “updown” (@mgirlich, #1057), and now order by non-numeric columns also in the up direction (@mgirlich, #1057).filter()
now works when using a window function and an external vector (#1048).group_by()
+ renamed columns works once again (@mgirlich, #928).last()
is correctly translated when no window frame is specified (@mgirlich, #1063).setOldClass()
uses a namespace, fixing an installation issue (@mgirlich, #927).sql()
is now translated differently. The...
are now evaluated locally instead of being translated withtranslate_sql()
(@mgirlich, #952).
Backend specific improvements
-
HANA:
- Correctly translates
as.character()
(#1027). -
copy_inline()
now works for Hana (#950)
- Correctly translates
-
MySQL:
-
str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993)
-
-
Oracle:
-
slice_sample()
now works for Oracle (@mgirlich, #986). -
copy_inline()
now works for Oracle (#972)
-
-
PostgreSQL:
- Generates correct literals for Dates (#727).
-
str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993) -
rows_*()
use the column types ofx
when auto copying (@mgirlich, #909).
-
Redshift:
-
round()
now respects thedigits
argument (@owenjonesuob, #1033). - No longer tries to use named windows anymore (@owenjonesuob, #1035).
-
copy_inline()
now works for Redshift (#949, thanks to @ejneer for an initial implementation). -
str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993)
-
Snowflake:
numeric functions:
all()
,any()
,log10()
,round()
,cor()
,cov()
andsd()
.date functions:
day()
,mday()
,wday()
,yday()
,week()
,isoweek()
,month()
,quarter()
,isoyear()
,seconds()
,minutes()
,hours()
,days()
,weeks()
,months()
,years()
andfloor_date()
.string functions:
grepl()
,paste()
,paste0()
,str_c()
,str_locate()
,str_detect()
,str_replace()
,str_replace_all()
,str_remove()
,str_remove_all()
,str_trim()
,str_squish()
andstr_flatten()
(@fh-afrachioni, #860).str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993)-
SQLite:
-
quantile()
gives a better error saying that it is not supported (@mgirlich, #1000).
-
-
SQL server:
-
Teradata:
- Querying works again. Unfortunately, the fix requires every column to once again by explicitly selected (@mgirlich, #966).
- New translations for
as.Date()
,week()
,quarter()
,paste()
,startsWith()
,row_number()
,weighted.mean()
,lead()
,lag()
, andcumsum()
(@overmar, #913).
dbplyr 2.2.1
CRAN release: 2022-06-27
Querying Oracle databases works again. Unfortunately, the fix requires every column to be explicitly selected again (@mgirlich, #908).
semi_join()
andanti_join()
work again for Spark (@mgirlich, #915).str_c()
is now translated to||
in Oracle (@mgirlich, #921).sd()
,var()
,cor()
andcov()
now give clear error messages on databases that don’t support them.
dbplyr 2.2.0
CRAN release: 2022-06-05
New features
SQL formatting has been considerably improved with new wrapping and indenting.
show_query()
creates more readable queries by printing the keywords in blue (@mgirlich, #644). When possible dbplyr now usesSELECT *
instead of explicitly selecting every column (@mgirlich).Added support for
rows_insert()
,rows_append()
,rows_update()
,rows_patch()
,rows_upsert()
, androws_delete()
(@mgirlich, #736).Added
copy_inline()
as acopy_to()
equivalent that does not need write access (@mgirlich, #628).remote_query()
,show_query()
,compute()
andcollect()
have an experimentalcte
argument. IfTRUE
the SQL query will use common table expressions instead of nested queries (@mgirlich, #638).New
in_catalog()
, which works likein_schema()
, but allows creation of table identifiers consisting of three components: catalog, schema, name (#806, @krlmlr).
Improvements to SQL generation
When possible, dbplyr now uses
SELECT *
instead of explicitly selecting every column (@mgirlich).-
Improved translations for specific backends:
-
as.Date()
for Oracle (@mgirlich, #661). -
case_when()
with a final clause of the formTRUE ~ ...
usesELSE ...
for SQLite (@mgirlich, #754). -
day()
,week()
,isoweek()
, andisoyear()
for Postgres (@mgirlich, #675). -
explain()
for ROracle (@mgirlich). -
fill()
for SQL Server (#651, @mgirlich) and RPostgreSQL (@mgirlich). -
quantile()
for SQL Server (@mgirlich, #620). -
str_flatten()
for Redshift (@hdplsa, #804) -
slice_sample()
for MySQL/MariaDB and SQL Server (@mgirlich, #617). -
union()
for Hive (@mgirlich, #663).
-
The backend function
dbplyr_fill0()
(used for databases that lackIGNORE NULLS
support) now respects database specific translations (@rsund, #753).Calls of the form
stringr::foo()
orlubridate::foo()
are now evaluated in the database, rather than locally (#197).Unary plus (e.g.
db %>% filter(x == +1)
) now works (@mgirlich, #674).is.na()
,ifelse()
,if_else()
,case_when()
, andif()
generate slightly more compact SQL (@mgirlich, #738).if_else()
now supports themissing
argument (@mgirlich, #641).quantile()
no longer errors when using thena.rm
argument (@mgirlich, #600).remote_name()
now returns a name in more cases where it makes sense (@mgirlich, #850).The partial evaluation code is now more aligned with
dtplyr
. This makes it easier to transfer bug fixes and new features from one package to the other. In this process the second argument ofpartial_eval()
was changed to a lazy frame instead of a character vector of variables (@mgirlich, #766). Partially evaluated expressions with infix operations are now correctly translated. For exampletranslate_sql(!!expr(2 - 1) * x)
now works (@mgirlich, #634).
Minor improvements and bug fixes
New
pillar::tbl_format_header()
method for lazy tables: Printing a lazy table where all rows are displayed also shows the exact number of rows in the header. The threshold is controlled bygetOption("pillar.print_min")
, with a default of 10 (#796, @krlmlr).The 1st edition extension mechanism is formally deprecated (#507).
across()
,if_any()
andif_all()
now defaults to.cols = everything()
(@mgirlich, #760). If.fns
is not providedif_any()
andif_all()
work like a parallel version ofany()
/any()
(@mgirlich, #734).across()
,if_any()
, andif_all()
can now translate evaluated lists and functions (@mgirlich, #796), and accept the name of a list of functions (@mgirlich, #817).Multiple
across()
calls inmutate()
andtransmute()
can now access freshly created variables (@mgirlich, #802).add_count()
now doesn’t change the groups of the input (@mgirlich, #614).compute()
can now handle whenname
is named by unnaming it first (@mgirlich, #623), and now works whentemporary = TRUE
for Oracle (@mgirlich, #621).distinct()
now supports.keep_all = TRUE
(@mgirlich, #756).explain()
passes...
to methods (@mgirlich, #783), and works for Redshift (@mgirlich, #740).filter()
throws an error if you supply a named argument (@mgirlich, #764).Joins disambiguates columns that only differ in case (@mgirlich, #702). New arguments
x_as
andy_as
allow you to control the table alias used in SQL query (@mgirlich, #637). Joins withna_matches = "na"
now work for DuckDB (@mgirlich, #704).mutate()
andtransmute()
use named windows if a window definition is used at least twice and the backend supports named windows (@mgirlich, #624).mutate()
now supports the arguments.keep
,.before
, and.after
(@mgirlich, #802).na.rm = FALSE
only warns once every 8 hours across all functions (#899).nesting()
now supports the.name_repair
argument (@mgirlich, #654).pivot_longer()
can now pivot a column namedname
(@mgirlich, #692), can repair names (@mgirlich, #694), and can work with multiplenames_from
columns (@mgirlich, #693).pivot_wider(values_fn = )
andpivot_longer(values_transform = )
can now be formulas (@mgirlich, #745).pivot_wider()
now supports the argumentsnames_vary
,names_expand
, andunused_fn
(@mgirlich, #774).remote_name()
now returns a name in more cases where it makes sense (@mgirlich, #850).sql_random()
is now exported.ungroup()
removes variables in...
from grouping (@mgirlich, #689).transmute()
now keeps grouping variables (@mgirlich, #802).
dbplyr 2.1.1
CRAN release: 2021-04-06
New support for Snowflake (@edgararuiz)
compute()
,sql_table_index()
, andsql_query_wrap()
now work with schemas (@mgirlich, #595).group_by()
now ungroups when the dots argument is empty and.add
isFALSE
(@mgirlich, #615).sql_escape_date()
andsql_escape_datetime
gain methods for MS Access (@erikvona, #608).
dbplyr 2.1.0
CRAN release: 2021-02-03
New features
Thanks to @mgirlich, dbplyr gains support for key verbs from tidyr:
pivot_longer()
(#532),pivot_wider()
(#543),expand()
(#538),complete()
(#538),replace_na()
(#538),fill()
(#566).@mgirlich is now a dbplyr author in recognition of his significant and sustained contributions.
across()
implementation has been rewritten to support more inputs: it now translates formulas (#525), works with SQL functions that don’t have R translations (#534), and work withNULL
(#554)summarise()
now supports argument.groups
(@mgirlich, #584).
SQL translation
All backends:
str_sub()
,substr()
andsubstring()
get better translations (#577). Most importantly, the results of using negative locations should match the underlying R implementations more closely.-
MS SQL:
as.integer()
andas.integer64()
translations cast first toNUMERIC
to avoid CASTing weirdness (@DavidPatShuiFong, #496).Assumes a boolean context inside of
[
(#546)str_sub()
withend = -1
now works (#577).
Redshift:
lag()
andlead()
lose thedefault
parameter since it’s not supported (@hdplsa, #548).SQLite: custom translation of
full_join()
andright_join()
(@mgirlich, #536).
Minor improvements and bug fixes
RPostgreSQL backend warns if
temporary = TRUE
since temporary tables are not supported byRPostgreSQL::dbWriteTable()
(#574).count()
method provides closer match to dplyr semantics (#347).distinct()
now respects grouping (@mgirlich, #535).db_connection_describe()
no longer uses partial matching (@mgirlich, #564).pull()
no longerselect()
s the result when there’s already only one variable (#562).select()
no longer relocates grouping variables to the front (@mgirlich, #568). and informs when adding missing grouping variables (@mgirlich, #559).
dbplyr 2.0.0
CRAN release: 2020-11-03
dplyr 1.0.0 compatibility
across()
is now translated into individual SQL statements (#480).rename()
andselect()
support dplyr 1.0.0 tidyselect syntax (apart from predicate functions which can’t easily work on computed queries) (#502).relocate()
makes it easy to move columns (#494) andrename_with()
makes it easy to rename columns programmatically (#502).slice_min()
,slice_max()
, andslice_order()
are now supported.slice_head()
andslice_tail()
throw clear error messages (#394)
SQL generation
Documentation has been radically improved with new topics for each major verb and each backend giving more details about the SQL translation.
intersect()
,union()
andsetdiff()
gain anall
argument to add theALL
argument (#414).Join functions gains a
na_matches
argument that allows you to control whether or notNA
(NULL
) values match otherNA
values. The default is"never"
, which is the usual behaviour in databases. You can setna_matches = "na"
to match R’s usual join behaviour (#180). Additional arguments error (instead of being silently swallowed) (#382).Joins now only use aliases where needed to disambiguate columns; this should make generated queries more readable.
Subqueries no longer include an
ORDER BY
clause. This is not part of the SQL spec, and has very limited support across databases. Now such queries generate a warning suggesting that you move yourarrange()
call later in the pipeline (#276). (There’s one exception:ORDER BY
is still generated ifLIMIT
is present; this tends to affect the returns rows but not necessarily their order).Subquery names are now scoped within the query. This makes query text deterministic which helps some query optimisers/cachers (#336).
sql_optimise()
now can partially optimise a pipeline; due to an unfortunate bug it previously gave up too easily.in_schema()
quotes each input individually (#287) (usesql()
to opt out of quoting, if needed). AndDBI::Id()
should work anywhere thatin_schema()
does.
SQL translation
Experimental new SAP HANA backend (#233). Requires the latest version of odbc.
-
All backends:
You can now use
::
in translations, so that (e.g.)dbplyr::n()
is translated tocount(*)
(#207).[[
can now also translate numeric indices (#520).%/%
now generates a clear error message; previously it was translated to/
which is not correct (#108).sub_str()
translation is more consistent in edge cases (@ianmcook).All
median()
(@lorenzwalthert, #483),pmin()
,pmax()
(#479),sd()
andvar()
functions have anna.rm
argument that warns once when notTRUE
. This makes them consistent withmean()
andsum()
.substring()
is now translated the same way assubstr()
(#378).
blob vectors can now be used with
!!
and!!!
operators, for example infilter()
(@okhoma, #433)MySQL uses standard SQL for index creation.
MS SQL translation does better a distinguishing between bit and boolean (#377, #318).
if
andifelse
once again generateIIF
, creating simpler expressions.as.*()
function usesTRY_CAST()
instead ofCAST()
for version 11+ (2012+) (@DavidPatShuiFong, #380).odbc no longer translates
count()
; this was an accidental inclusion.Oracle translation now depends on Oracle 12c, and uses a “row-limiting” clause for
head()
. It gains translations fortoday()
andnow()
, and improvedas.Date()
translation (@rlh1994, #267).PostgreSQL: new translations for lubridate period functions
years()
,months()
,days()
, andfloor_date()
(@bkkkk, #333) and stringr functionsstr_squish()
,str_remove()
, andstr_remove_all()
(@shosaco).-
New RedShift translations when used with
RPostgres::Redshift()
.str_replace()
errors since there’s no Redshift translation, andstr_replace_all()
usesREGEXP_REPLACE()
(#446).as.numeric()
andas.double()
cast toFLOAT
(#408).
SQLite gains translations for lubridate functions
today()
,now()
,year()
,month()
,day()
,hour()
,minute()
,second()
,yday()
(#262), and correct translation formedian()
(#357).
Extensibility
If you are the author of a dbplyr backend, please see vignette("backend-2")
for details.
New
dbplyr_edition()
generic allows you to opt-in to the 2nd edition of the dbplyr API.-
db_write_table()
now callsDBI::dbWriteTable()
instead of nine generics that formerly each did a small part:db_create_indexes()
,db_begin()
,db_rollback()
,db_commit()
,db_list_tables()
,drop_drop_table()
,db_has_table()
,db_create_table()
, anddb_data_types()
. You can now delete the methods for these generics.db_query_rows()
is no longer used; it appears that it hasn’t been used for some time, so if you have a method, you can delete it. DBI::dbQuoteIdentifier()
is now used instead ofsql_escape_ident()
andDBI::dbQuoteString()
instead ofsql_escape_string()
.-
A number of
db_*
generics have been replaced with new SQL generation generics:-
dplyr::db_analyze()
->dbplyr::sql_table_analyze()
-
dplyr::db_create_index()
->dbplyr::sql_table_index()
-
dplyr::db_explain()
->dbplyr::sql_queriy_explain()
-
dplyr::db_query_fields()
->dbplyr::sql_query_fields()
-
dplyr::db_save_query()
->dbplyr::sql_query_save()
This makes them easier to test and is an important part of the process of moving all database generics in dbplyr (#284).
-
-
A number of other generics have been renamed to facilitate the move from dplyr to dbplyr:
-
dplyr::sql_select()
->dbplyr::sql_query_select()
-
dplyr::sql_join()
->dbplyr::sql_query_join()
-
dplyr::sql_semi_join()
->dbplyr::sql_query_semi_join()
-
dplyr::sql_set_op()
->dbplyr::sql_query_set_op()
-
dplyr::sql_subquery()
->dbplyr::sql_query_wrap()
-
dplyr::db_desc()
->dbplyr::db_connection_describe()
-
New
db_temporary_table()
generic makes it easier to work with databases that require temporary tables to be specially named.New
sql_expr_matches()
generic allows databases to use more efficient alternatives when determine if two values “match” (i.e. like equality but a pair ofNULL
s will also match). For more details, see https://modern-sql.com/feature/is-distinct-fromNew
sql_join_suffix()
allows backends to control the default suffixes used (#254).
Minor improvements and bug fixes
All old lazy eval shims have been removed. These have been deprecated for some time.
Date-time escaping methods for Athena and Presto have moved to the packages where they belong.
Attempting to embed a Shiny reactive in a query now gives a helpful error (#439).
copy_lahman()
andcopy_nycflights13()
(and hencenycflights13_sqlite()
) and friends now return DBI connections rather than the now deprecatedsrc_dbi()
(#440).copy_to()
can nowoverwrite
when table is specified with schema (#489), and gains anin_transaction
argument used to optionally suppress the transaction wrapper (#368).distinct()
no longer duplicates column if grouped (#354).transmute()
now correctly tracks variables it needs when creating subqueries (#313).mutate()
grouping variables no longer generates a downstream error (#396)mutate()
correctly generates subqueries when you re-use the same variable three or more times (#412).window_order()
overrides ordering, rather than appending to it.
dbplyr 1.4.4
CRAN release: 2020-05-27
Internally
DBI::dbExecute()
now usesimmediate = TRUE
; this improves support for session-scoped temporary tables in MS SQL (@krlmlr, #438).Subqueries with
ORDER BY
useTOP 9223372036854775807
instead ofTOP 100 PERCENT
on SQL Server for compatibility with Azure Data Warehouse (#337, @alexkyllo).escape()
now supportsblob
vectors using newsql_escape_raw()
generic. It enables using blob variables in dplyr verbs, for example to filter nvarchar values by UTF-16 blobs (see https://github.com/r-dbi/DBI/issues/215#issuecomment-356376133). (@okhoma, #433)Added
setOldClass()
calls for"ident"
and"ident_q"
classes for compatibility with dplyr 1.0.0 (#448, @krlmlr).Postgres
str_detect()
translation uses same argument names as stringr, and gains anegate
argument (#444).semi_join()
andanti_join()
now correctly support thesql_on
argument (#443, @krlmlr).
dbplyr 1.4.3
CRAN release: 2020-04-19
dbplyr now uses RPostgres (instead of RPostgreSQL) and RMariaDB (instead of RMySQL) for its internal tests and data functions (#427).
The Date and POSIXt methods for
escape()
now use exportedsql_escape_date()
andsql_escape_datetime()
generics to allow backend specific formatting of date and datetime literals. These are used to provide methods for Athena and Presto backends (@OssiLehtinen, #384, #391).first()
,last()
,nth()
,lead()
andlag()
now respect thewindow_frame()
(@krlmlr, #366).SQL server: new translations for
str_flatten()
(@PauloJhonny, #405).SQL server: temporary datasets are now session-local, not global (#401).
Postgres: correct
str_detect()
,str_replace()
andstr_replace_all()
translation (@shosaco, #362).
dbplyr 1.4.2
CRAN release: 2019-06-17
Fix bug when partially evaluating unquoting quosure containing a single symbol (#317)
Fixes for rlang and dpylr compatibility.
dbplyr 1.4.1
CRAN release: 2019-06-05
Minor improvements to SQL generation
x %in% y
strips names ofy
(#269).Enhancements for scoped verbs (
mutate_all()
,summarise_if()
,filter_at()
etc) (#296, #306).MS SQL use
TOP 100 PERCENT
as stop-gap to allow subqueries withORDER BY
(#277).Window functions now translated correctly for Hive (#293, @cderv).
dbplyr 1.4.0
CRAN release: 2019-04-23
Breaking changes
Error: `con` must not be NULL
: If you see this error, it probably means that you have forgotten to passcon
down to a dbplyr function. Previously, dbplyr defaulted to usingsimulate_dbi()
which introduced subtle escaping bugs. (It’s also possible I have forgotten to pass it somewhere that the dbplyr tests don’t pick up, so if you can’t figure it out, please let me know).Subsetting (
[[
,$
, and[
) functions are no longer evaluated locally. This makes the translation more consistent and enables useful new idioms for modern databases (#200).
New features
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/) and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain support for window functions, available in Maria DB 10.2, MySQL 8.0, and SQLite 3.25 (#191).
-
Overall, dplyr generates many fewer subqueries:
Joins and semi-joins no longer add an unneeded subquery (#236). This is facilitated by the new
bare_identifier_ok
argument tosql_render()
; the previous argument was calledroot
and confused me.Many sequences of
select()
,rename()
,mutate()
, andtransmute()
can be collapsed into a single query, instead of always generating a subquery (#213).
New
vignette("sql")
describes some advantages of dbplyr over SQL (#205) and gives some advice about writing literal SQL inside of dplyr, when you need to (#196).New
vignette("reprex")
gives some hints on creating reprexes that work anywhere (#117). This is supported by a newtbl_memdb()
that matches the existingtbl_lazy()
.All
..._join()
functions gain ansql_on
argument that allows specifying arbitrary join predicates in SQL code (#146, @krlmlr).
SQL translations
New translations for some lubridate functions:
today()
,now()
,year()
,month()
,day()
,hour()
,minute()
,second()
,quarter()
,yday()
(@colearendt, @derekmorr). Also added new translation foras.POSIXct()
.New translations for stringr functions:
str_c()
,str_sub()
,str_length()
,str_to_upper()
,str_to_lower()
, andstr_to_title()
(@colearendt). Non-translated stringr functions throw a clear error.New translations for bitwise operations:
bitwNot()
,bitwAnd()
,bitwOr()
,bitwXor()
,bitwShiftL()
, andbitwShiftR()
. Unlike the base R functions, the translations do not coerce arguments to integers (@davidchall, #235).New translation for
x[y]
toCASE WHEN y THEN x END
. This enablessum(a[b == 0])
to work as you expect from R (#202).y
needs to be a logical expression; if not you will likely get a type error from your database.New translations for
x$y
andx[["y"]]
tox.y
, enabling you to index into nested fields in databases that provide them (#158).The
.data
and.env
pronouns of tidy evaluation are correctly translated (#132).New translation for
median()
andquantile()
. Works for all ANSI compliant databases (SQL Server, Postgres, MariaDB, Teradata) and has custom translations for Hive. Thanks to @edavidaja for researching the SQL variants! (#169)na_if()
is correct translated toNULLIF()
(rather thanNULL_IF
) (#211).n_distinct()
translation throws an error when given more than one argument. (#101, #133).New default translations for
paste()
,paste0()
, and the hyperbolic functions (these previously were only available for ODBC databases).Corrected translations of
pmin()
andpmax()
toLEAST()
andGREATEST()
for ANSI compliant databases (#118), toMIN()
andMAX()
for SQLite, and to an error for SQL server.New translation for
switch()
to the simple form ofCASE WHEN
(#192).
SQL simulation
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
SQL simulation has been overhauled. It now works reliably, is better documented, and always uses ANSI escaping (i.e.
`
for field names and'
for strings).tbl_lazy()
now actually puts adbplyr::src
in the$src
field. This shouldn’t affect any downstream code unless you were previously working around this weird difference betweentbl_lazy
andtbl_sql
classes. It also includes thesrc
class in its class, and when printed, shows the generated SQL (#111).
Database specific improvements
-
MySQL/MariaDB
Translations also applied to connections via the odbc package (@colearendt, #238)
Basic support for regular expressions via
str_detect()
andstr_replace_all()
(@colearendt, #168).Improved translation for
as.logical(x)
toIF(x, TRUE, FALSE)
.
-
Oracle
-
Postgres
- Basic support for regular expressions via
str_detect()
andstr_replace_all()
(@colearendt, #168).
- Basic support for regular expressions via
-
SQLite
-
explain()
translation now generatesEXPLAIN QUERY PLAN
which generates a higher-level, more human friendly explanation.
-
-
SQL server
Improved translation for
as.logical(x)
toCAST(x as BIT)
(#250).copy_to()
method applies temporary table name transformation earlier so that you can now overwrite temporary tables (#258).db_write_table()
method uses correct argument name for passing along field types (#251).
Minor improvements and bug fixes
Aggregation functions only warn once per session about the use of
na.rm = TRUE
(#216).table names generated by
random_table_name()
have the prefix “dbplyr_”, which makes it easier to find them programmatically (@mattle24, #111)Functions that are only available in a windowed (
mutate()
) query now throw an error when called in a aggregate (summarise()
) query (#129)arrange()
understands the.by_group
argument, making it possible sort by groups if desired. The default isFALSE
(#115)distinct()
now handles computed variables likedistinct(df, y = x + y)
(#154).escape()
,sql_expr()
andbuild_sql()
no longer acceptcon = NULL
as a shortcut forcon = simulate_dbi()
. This made it too easy to forget to passcon
along, introducing extremely subtle escaping bugs.win_over()
gains acon
argument for the same reason.New
escape_ansi()
always uses ANSI SQL 92 standard escaping (for use in examples and documentation).mutate(df, x = NULL)
dropsx
from the output, just like when working with local data frames (#194).partial_eval()
processes inlined functions (including rlang lambda functions). This makes dbplyr work with more forms of scoped verbs likedf %>% summarise_all(~ mean(.))
,df %>% summarise_all(list(mean))
(#134).sql_aggregate()
now takes an optional argumentf_r
for passing tocheck_na_rm()
. This allows the warning to show the R function name rather than the SQL function name (@sverchkov, #153).sql_infix()
gains apad
argument for the rare operator that doesn’t need to be surrounded by spaces.sql_prefix()
no longer turns SQL functions into uppercase, allowing for correct translation of case-sensitive SQL functions (#181, @mtoto).summarise()
gives a clear error message if you refer to a variable created in that samesummarise()
(#114).New
sql_call2()
which is torlang::call2()
assql_expr()
is torlang::expr()
.show_query()
andexplain()
usecat()
rather than message.union()
,union_all()
,setdiff()
andintersect()
do a better job of matching columns across backends (#183).
dbplyr 1.3.0
CRAN release: 2019-01-09
- Now supports for dplyr 0.8.0 (#190) and R 3.1.0
API changes
Calls of the form
dplyr::foo()
are now evaluated in the database, rather than locally (#197).vars
argument totbl_sql()
has been formally deprecated; it hasn’t actually done anything for a while (#3254).src
andtbl
objects now include a class generated from the class of the underlying connection object. This makes it possible for dplyr backends to implement different behaviour at the dplyr level, when needed. (#2293)
SQL translation
x %in% y
is now translated toFALSE
ify
is empty (@mgirlich, #160).New
as.integer64(x)
translation toCAST(x AS BIGINT)
(#3305)case_when
now translates with a ELSE clause if a formula of the formTRUE~<RHS>
is provided . (@cderv, #112)str_detect()
now uses correct parameter order (#3397)-
MS SQL
-
Oracle
- Custom
db_drop_table()
now only drops tables if they exist (#3306) - Custom
setdiff()
translation (#3493) - Custom
db_explain()
translation (#3471)
- Custom
-
SQLite
- Correct translation for
as.numeric()
/as.double()
(@chris-park, #171).
- Correct translation for
-
Redshift
Minor improvements and bug fixes
copy_to()
will only remove existing table whenoverwrite = TRUE
and the table already exists, eliminating a confusing “NOTICE” from PostgreSQL (#3197).partial_eval()
handles unevaluated formulas (#184).pull.tbl_sql()
now extracts correctly from grouped tables (#3562).sql_render.op()
now correctly forwards thecon
argument (@kevinykuo, #73).
dbplyr 1.2.0
CRAN release: 2018-01-03
New top-level translations
-
New translations for
- MS Access (#2946) (@DavisVaughan)
- Oracle, via odbc or ROracle (#2928, #2732, @edgararuiz)
- Teradata.
- Redshift.
dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest.
New features
copy_to()
can now “copy” tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can alsocopy_to
tbl_sqls from another source, andcopy_to()
will automatically collect then copy.Initial support for stringr functions:
str_length()
,str_to_upper()
,str_to_lower()
,str_replace_all()
,str_detect()
,str_trim()
. Regular expression support varies from database to database, but most simple regular expressions should be ok.
Tools for developers
db_compute()
gains ananalyze
argument to matchdb_copy_to()
.New
remote_name()
,remote_con()
,remote_src()
,remote_query()
andremote_query_plan()
provide a standard API for get metadata about a remote tbl (#3130, #2923, #2824).New
sql_expr()
is a more convenient building block for low-level SQL translation (#3169).New
sql_aggregate()
andwin_aggregate()
for generating SQL and windowed SQL functions for aggregates. These take one argument,x
, and warn ifna.rm
is notTRUE
(#3155).win_recycled()
is equivalent towin_aggregate()
and has been soft-deprecated.db_write_table
now needs to return the table name
Minor improvements and bug fixes
Multiple
head()
calls in a row now collapse to a single call. This avoids a printing problem with MS SQL (#3084).escape()
now works with integer64 values from the bit64 package (#3230)if
,ifelse()
, andif_else()
now correctly scope the false condition so that it only applies to non-NULL conditions (#3157)ident()
andident_q()
handle 0-length inputs better, and should be easier to use with S3 (#3212)in_schema()
should now work in more places, particularly incopy_to()
(#3013, @baileych)SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220).
mutate()
has better logic for splitting a single mutate into multiple subqueries (#3095).Improved
paste()
andpaste0()
support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support forstr_flatten()
which behaves likepaste(x, collapse = "-")
(but for technical reasons can’t be implemented as a straightforward translation ofpaste()
).same_src.tbl_sql()
now performs correct comparison instead of always returningTRUE
. This means thatcopy = TRUE
once again allows you to perform cross-database joins (#3002).select()
queries no longer alias column names unnecessarily (#2968, @DavisVaughan).select()
andrename()
are now powered by tidyselect, fixing a few renaming bugs (#3132, #2943, #2860).summarise()
once again performs partial evaluation before database submission (#3148).test_src()
makes it easier to access a single test source.
Database specific improvements
-
MS SQL
Better support for temporary tables (@Hong-Revo)
Different translations for filter/mutate contexts for:
NULL
evaluation (is.na()
,is.null()
), logical operators (!
,&
,&&
,|
,||
), and comparison operators (==
,!=
,<
,>
,>=
,<=
)
MySQL:
copy_to()
(viadb_write_table()
) correctly translates logical variables to integers (#3151).odbc: improved
n()
translation in windowed context.SQLite: improved
na_if
translation (@cwarden)Oracle: changed VARVHAR to VARCHAR2 datatype (@washcycle, #66)
dbplyr 1.1.0
CRAN release: 2017-06-27
New features
full_join()
over non-overlapping columnsby = character()
translated toCROSS JOIN
(#2924).case_when()
now translates to SQL “CASE WHEN” (#2894)x %in% c(1)
now generates the same SQL asx %in% 1
(#2898).New
window_order()
andwindow_frame()
give you finer control over the window functions that dplyr creates (#2874, #2593).Added SQL translations for Oracle (@edgararuiz).
Minor improvements and bug fixes
x %in% c(1)
now generates the same SQL asx %in% 1
(#2898).head(tbl, 0)
is now supported (#2863).select()
ing zero columns gives a more information error message (#2863).Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823).
Added custom
db_analyze_table()
for MS SQL, Oracle, Hive and Impala (@edgararuiz)Added support for
sd()
for aggregate and window functions (#2887) (@edgararuiz)You can now use the magrittr pipe within expressions, e.g.
mutate(mtcars, cyl %>% as.character())
.If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to
NULL
with a warning. Nowsql_variant()
checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887)
dbplyr 1.0.0
CRAN release: 2017-06-09
New features
-
tbl()
andcopy_to()
now work directly with DBI connections (#2423, #2576), so there is no longer a need to generate a dplyr src. dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation.
compute()
andcollapse()
now preserve the “ordering” of rows. This only affects the computation of window functions, as the rest of SQL does not care about row order (#2281).copy_to()
gains anoverwrite
argument which allows you to overwrite an existing table. Use with care! (#2296)New
in_schema()
function makes it easy to refer to tables in schema:in_schema("my_schema_name", "my_table_name")
.
Deprecated and defunct
-
query()
is no longer exported. It hasn’t been useful for a while so this shouldn’t break any code.
Verb-level SQL generation
Partial evaluation occurs immediately when you execute a verb (like
filter()
ormutate()
) rather than happening when the query is executed (#2370).mutate.tbl_sql()
will now generate as many subqueries as necessary so that you can refer to variables that you just created (like in mutate with regular dataframes) (#2481, #2483).-
SQL joins have been improved:
SQL joins always use the
ON ...
syntax, avoidingUSING ...
even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you’d write by hand, eliminating a layer or two of subqueries (#2333)[API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from
x
, and never fromy
(#2410)[API] The
sql_join()
generic now gains avars
argument which lists the variables taken from the left and right sides of the join. If you have a customsql_join()
method, you’ll need to update how your code generates joins, following the template insql_join.generic()
.full_join()
throws a clear error when you attempt to use it with a MySQL backend (#2045)right_join()
andfull_join()
now return results consistent with local data frame sources when there are records in the right table with no match in the left table.right_join()
returns values ofby
columns from the right table.full_join()
returns coalesced values ofby
columns from the left and right tables (#2578, @ianmcook)
group_by()
can now perform an inline mutate for database backends (#2422).-
The SQL generation set operations (
intersect()
,setdiff()
,union()
, andunion_all()
) have been considerably improved.By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching variables with NULL (#2556).
rename()
andgroup_by()
now combine correctly (#1962)tbl_lazy()
andlazy_tbl()
have been exported. These help you test generated SQL with out an active database connection.
Vector-level SQL generation
New
as.sql()
safely coerces an input to SQL.More translators for
as.character()
,as.integer()
andas.double()
(#2775).New
ident_q()
makes it possible to specifier identifiers that do not need to be quoted.-
Translation of inline scalars:
Logical values are now translated differently depending on the backend. The default is to use “true” and “false” which is the SQL-99 standard, but not widely support. SQLite translates to “0” and “1” (#2052).
Inf
and-Inf
are correctly escapedBetter test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004).
Quoting defaults to
DBI::dbEscapeString()
andDBI::dbQuoteIdentifier()
respectively.
::
and:::
are handled correctly (#2321)x %in% 1
is now correctly translated tox IN (1)
(#511).ifelse()
andif_else()
use correct argument names in SQL translation (#2225).ident()
now returns an object with classc("ident", "character")
. It no longer contains “sql” to indicate that this is not already escaped.is.na()
andis.null()
gain extra parens in SQL translation to preserve correct precedence (#2302).[API]
log(x, b)
is now correctly translated to the SQLlog(b, x)
(#2288). SQLite does not support the 2-argument log function so it is translated tolog(x) / log(b)
.nth(x, i)
is now correctly translated tonth_value(x, i)
.n_distinct()
now accepts multiple variables (#2148).[API]
substr()
is now translated to SQL, correcting for the difference in the third argument. In R, it’s the position of the last character, in SQL it’s the length of the string (#2536).win_over()
escapes expression using current database rules.
Backends
copy_to()
now usesdb_write_table()
instead ofdb_create_table()
anddb_insert_into()
.db_write_table.DBIConnection()
usesdbWriteTable()
.New
db_copy_to()
,db_compute()
anddb_collect()
allow backends to override the entire database process behindcopy_to()
,compute()
andcollect()
.db_sql_render()
allow additional control over the SQL rendering process.All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points.
sql_escape_logical()
allows you to control the translation of literal logicals (#2614).src_desc()
has been replaced bydb_desc()
and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src.win_over()
,win_rank()
,win_recycled()
,win_cumulative()
,win_current_group()
andwin_current_order()
are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126).SQL translation for Microsoft SQL Server (@edgararuiz)
SQL translation for Apache Hive (@edgararuiz)
SQL translation for Apache Impala (@edgararuiz)
Minor bug fixes and improvements
collect()
once again defaults to return all rows in the data (#1968). This makes it behave the same asas.data.frame()
andas_tibble()
.collect()
only regroups by variables present in the data (#2156)collect()
will automatically LIMIT the result to then
, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083).common_by()
gets a better error message for unexpected inputs (#2091)copy_to()
no longer checks that the table doesn’t exist before creation, instead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470)copy_to()
now succeeds for MySQL if a character column containsNA
(#1975, #2256, #2263, #2381, @demorenoc, @eduardgrebe).copy_to()
now returns its output invisibly (since you’re often just calling for the side-effect).distinct()
reports improved variable information for SQL backends. This means that it is more likely to work in the middle of a pipeline (#2359).Ungrouped
do()
on database backends now collects all data locally first (#2392).Call
dbFetch()
instead of the deprecatedfetch()
(#2134). UseDBI::dbExecute()
for non-query SQL commands (#1912)explain()
andshow_query()
now invisibly return the first argument, making them easier to use inside a pipeline.print.tbl_sql()
displays ordering (#2287) and prints table name, if known.print(df, n = Inf)
andhead(df, n = Inf)
now work with remote tables (#2580).db_desc()
andsql_translate_env()
get defaults for DBIConnection.Formatting now works by overriding the
tbl_sum()
generic instead ofprint()
. This means that the output is more consistent with tibble, and thatformat()
is now supported also for SQL sources (tidyverse/dbplyr#14).
Lazy ops
[API] The signature of
op_base
has changed toop_base(x, vars, class)
-
[API]
translate_sql()
andpartial_eval()
have been refined:translate_sql()
no longer takes a vars argument; instead callpartial_eval()
yourself.Because it no longer needs the environment
translate_sql()_
now works with a list of dots, rather than alazy_dots
.partial_eval()
now takes a character vector of variable names rather than a tbl.This leads to a simplification of the
op
data structure: dots is now a list of expressions rather than alazy_dots
.
[API]
op_vars()
now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known).