Using ODBC connections in parallel code

Attempting to use multiple ODBC connections across parallelized worker threads may fail as in the following example:
loaddata <- function(cn){result <- sqlQuery(cn,'select * from boston')return(head(result))}library(RODBC)cn1 <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')cn2 <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')cn3 <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')cn4 <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')rxSetComputeContext('localpar')system.time ({z <- rxExec(loaddata, rxElemArg(list(cn1,cn2,cn3,cn4)), packagesToLoad='RODBC')})Error in, as.list(args)) :task 1 failed - "first argument is not an open RODBC channel"
The problem is the worker processes receive the ODBC connections as closed.

The issue here is that connections are process-specific, so unless the workers are sharing the parent process (as in multicore workers created via forking), the parent's connections can't be shared by the workers. To distribute ODBC computations on non-forked workers, establish the connections on each worker as part of the distributed task.

loaddata <- function(){library(RODBC)cn <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')result <- sqlQuery(cn,'select * from boston')return(head(result))}z <- system.time({z <- rxExec(loaddata,packagesToLoad='RODBC')})

Note This is a "FAST PUBLISH" article created directly from within the Microsoft support organization. The information contained herein is provided as-is in response to emerging issues. As a result of the speed in making it available, the materials may include typographical errors and may be revised at any time without notice. See Terms of Use for other considerations.

Article ID: 3103824 - Last Review: 11/01/2015 14:37:00 - Revision: 1.0

Revolution Analytics

  • KB3103824