QA: How can I randomly select data from an .xdf file?

You can use an R 'transform' function to transform the data and pass that function to the RevoScaleR 'rxDataStepXdf()' function. You can then use the newly created, subset .xdf file with other RevoScaleR functions. Below is a sample R script that creates a new .xdf file by randomly sampling a larger .xdf file using the hidden row selection variable available in 'transformFunc'. 

# Create a transformFunc that selects 25% of the data at random 
set.seed(13) 
xform <- function(data) { 
data$.rxRowSelection<-as.logical(rbinom(length(data[[1]]),1,.25)) 
return(data) 

rxDataStepXdf(inFile=inFile, outFile="sampledData.xdf", transformFunc=xform, overwrite=TRUE) 
# check that subsetting was done and the row selection variable is not kept in the data set. 
rxGetInfoXdf(inFile) 
rxGetInfoXdf("sampledData.xdf") 
Properties

Article ID: 3104278 - Last Review: Oct 29, 2015 - Revision: 1

Revolution Analytics

Feedback