Problem

How do I tell the RxTextData function to use the ‘|’ as delimiter or other character?

Solution

If your text data is not separated by commas or tabs, you must specify the delimiter using the columnDelimiters argument. (This is not actually an argument to rxImport, but to the underlying RxTextData data source object.) In normal usage, this argument is a single character, such as columnDelimiters="\t" for tab-delimited data or columnDelimiters="," for comma-delimited data. However, each column may be delimited by a different character; all the delimiters must be concatenated together into a single character string. For example, if you have one column delimited by a comma, a second by a plus sign, and a third by a new line, you would use the argument columnDelimiters=",+\n".id|val 1|a 2|bSo for the above data how do I fix the below code to consider ‘|’ as the delimeter

hdfsFS <- RxHdfsFileSystem(hostName=”dummy ", port="dummy") txtSource <- RxTextData("directory value/ file_name in hdfs", fileSystem=hdfsFS) airData <- rxImport(inData=txtSource, outFile = "/tmp/test.xdf",stringsAsFactors = TRUE, missingValueString = "M", rowsPerRead = 200000, overwrite=TRUE) rxSummary(~ id+val, data = airData)

2). To be able to read 'pipe'-delimited data, you will need to set the option 'delimiter="|"' in your RxTextData() call: 

txtSource <- RxTextData(("directory value/ file_name in hdfs", fileSystem=hdfsFS, delimiter = "|")

Need more help?

Want more options?

Explore subscription benefits, browse training courses, learn how to secure your device, and more.

Communities help you ask and answer questions, give feedback, and hear from experts with rich knowledge.