Question: How to use SQL queries with a Revolution R xdf file

Problem

Is it possible to run SQL queries on an .xdf read into RevoR and running SQL inside RevoR? 

Solution

It is not possible to run SQL on .xdf file read as it is a binary file that contains data and not a database. What allows us to work with such large data is that we can read "chunks" of the .xdf and that allows us to leverage disk resources in addition to memory resources and lets us manipulate Big data.

Options -

Since the xdf file is created by RevoR, you can perform SQL on the data when it is coming into RevoR by adding an SQL query to RxOdbcData. A sample would look like:

foo <- RxOdbcData(sqlQuery = "SELECT * FROM foo_database", 
connectionString = connectionString) 

Of course you will need to have a valid ODBC connection. The RevoScaleR ODBC Import guide has information on this.

Another option is to use rxDataStep to transform the data in the xdf into a dataframe. A dataframe must be held in memory so this may not be an option if you have extremely large .xdf files. You could then use an Open Source package called sqldf which allows you to use SQL Selects on dataframes.
Properties

Article ID: 3104289 - Last Review: 29 Oct 2015 - Revision: 1

Feedback