Using R to Program in SQL

Last week I was chatting with a coworker (over Skype because #2020quaratine & I never leave my apartment anymore) and we both agreed that the hardest task to recieve is “hey can you load this data into SQL?”. It doesn’t matter how many times we do it or how good we are at programming, this task is tough. It should be straightforward but even the most simple of data ends up with at least one error. As a result, after a while I realized there had to be an easier way addressing periodic tasks that needed data loaded into SQL.

For example, a project I recently worked on needed us to update a table in a SQL database on a weekly basis with the data from the previous week. So, every Monday, I had to load in the new data into SQL and then union it with the complete table containing all of our data. Instead of just loading the data into SQL (which if I’m being honest would have taken at least 15 minutes) and the unioning in it SQL, I built a Python script that loaded in the data via Python (which takes less than a couple seconds) and then updated my SQL table in the Python script. You can also do this in R and for tasks like this where you don’t need the table you’re loading in to be saved in your database, it’s a really useful trick.

Whether you’re doing it in Python, R, or another language, the general gist is the same. Within your script, you open up a connection to your SQL database and then write your SQL code within the R/Python script.

Here is how I set up the connection to a database. For the purpose of this post, I’m going to be connecting to Microsoft SQL Server and my SQL express server account. In the following query, we set up the connection to the SQL database that we want to use using the odbc and DBI libraries:

library(dplyr)
library(dbplyr)
library(DBI)
## Warning: package 'DBI' was built under R version 3.6.2
library(odbc)
con <- dbConnect(odbc::odbc(), 
                Driver = "SQL Server",
                Server = "LAALINEHAN\\SQLEXPRESS",
                Database = "alinehan",
                Trusted_connection = "True")

To query your data in SQL within your R script, you can use the dbGetQuery() function. Here’s a simple example of selecting some data from my SQL database:

dbGetQuery(conn = con, statement = '
select top 10 * 
from dbo.retail_sales
'
)
##    InvoiceNo StockCode                         Description Quantity
## 1     536365    85123A  WHITE HANGING HEART T-LIGHT HOLDER        6
## 2     536365     71053                 WHITE METAL LANTERN        6
## 3     536365    84406B      CREAM CUPID HEARTS COAT HANGER        8
## 4     536365    84029G KNITTED UNION FLAG HOT WATER BOTTLE        6
## 5     536365    84029E      RED WOOLLY HOTTIE WHITE HEART.        6
## 6     536365     22752        SET 7 BABUSHKA NESTING BOXES        2
## 7     536365     21730   GLASS STAR FROSTED T-LIGHT HOLDER        6
## 8     536366     22633              HAND WARMER UNION JACK        6
## 9     536366     22632           HAND WARMER RED POLKA DOT        6
## 10    536367     84879       ASSORTED COLOUR BIRD ORNAMENT       32
##                    InvoiceDate UnitPrice CustomerID        Country
## 1  2010-12-01 08:26:00.0000000      2.55      17850 United Kingdom
## 2  2010-12-01 08:26:00.0000000      3.39      17850 United Kingdom
## 3  2010-12-01 08:26:00.0000000      2.75      17850 United Kingdom
## 4  2010-12-01 08:26:00.0000000      3.39      17850 United Kingdom
## 5  2010-12-01 08:26:00.0000000      3.39      17850 United Kingdom
## 6  2010-12-01 08:26:00.0000000      7.65      17850 United Kingdom
## 7  2010-12-01 08:26:00.0000000      4.25      17850 United Kingdom
## 8  2010-12-01 08:28:00.0000000      1.85      17850 United Kingdom
## 9  2010-12-01 08:28:00.0000000      1.85      17850 United Kingdom
## 10 2010-12-01 08:34:00.0000000      1.69      13047 United Kingdom

If you want to use data from your R environment with your SQL connection you can use the paste function to build your query alongside your R objects and variables:

num <- 536365

dbGetQuery(conn = con, statement = paste0('
select * 
from dbo.retail_sales
where InvoiceNo = \'', num , '\'') 
)
##   InvoiceNo StockCode                         Description Quantity
## 1    536365    85123A  WHITE HANGING HEART T-LIGHT HOLDER        6
## 2    536365     71053                 WHITE METAL LANTERN        6
## 3    536365    84406B      CREAM CUPID HEARTS COAT HANGER        8
## 4    536365    84029G KNITTED UNION FLAG HOT WATER BOTTLE        6
## 5    536365    84029E      RED WOOLLY HOTTIE WHITE HEART.        6
## 6    536365     22752        SET 7 BABUSHKA NESTING BOXES        2
## 7    536365     21730   GLASS STAR FROSTED T-LIGHT HOLDER        6
##                   InvoiceDate UnitPrice CustomerID        Country
## 1 2010-12-01 08:26:00.0000000      2.55      17850 United Kingdom
## 2 2010-12-01 08:26:00.0000000      3.39      17850 United Kingdom
## 3 2010-12-01 08:26:00.0000000      2.75      17850 United Kingdom
## 4 2010-12-01 08:26:00.0000000      3.39      17850 United Kingdom
## 5 2010-12-01 08:26:00.0000000      3.39      17850 United Kingdom
## 6 2010-12-01 08:26:00.0000000      7.65      17850 United Kingdom
## 7 2010-12-01 08:26:00.0000000      4.25      17850 United Kingdom

The query above can easly be adapted for an update statement to solve my problem above of continuously updating a SQL table.

Next
Previous