Sign in with Microsoft
Sign in or create an account.
Hello,
Select a different account.
You have multiple accounts
Choose the account you want to sign in with.

This article describes how to run a Revolution R Enterprise script in a Hadoop cluster from a Windows client outside the cluster using ssh in a Cygwin environment.

  1. Install and configure Revolution R Enterprise 7.3 in the Hadoop cluster per the Revolution R Enterprise 7.3 Hadoop Configuration Guide. Verify the operation of RRE in the cluster when the script is run from within the cluster using the validation script from section 4.

  2. Install Revolution R Enterprise for Windows 7.3 on the client Windows system.

  3. Install the Cygwin environment on the client Windows system being sure to include the ssh client components. Verify ssh login capability for the R/Hadoop user from the Windows client system.

  4. Configure passwordless ssh for the R/Hadoop user by creating an ssh keypair on the client and on the Hadoop namenode for the user. Information on doing this can be found here:

    http://inside.mines.edu/fs_home/gmurray/HowTo/sshNotes.html

    or get assistance from your IT group as needed to comply with security requirements. Save the private .pem key on the Windows client. For example, "C:\data\hdp.pem".

  5. Manually verify the passwordless login for the R user (ex: scott) from a Cygwin bash session to the namenode using the key:

    $ ssh -i c:/data/hdp.pem scott@<namenode hostname or ip>
  6. If the manual test login is successful, modify the Hadoop compute context used when running the script from within the cluster to include ssh connection information needed by the client. For example:

    Basic hadoop compute context used when running the script from a cluster node

    myHadoopCluster <- RxHadoopMR(consoleOutput = TRUE)

    cluster <- rxSetComputeContext(myHadoopCluster)


    Extended hadoop compute context used when running the script from a Windows client via Cygwin ssh. 

    mySshUsername <- "scott"
    mySshHostname <- "<namenode hostname or ip>"

    myShareDir <- paste("/var/RevoShare", mySshUsername, sep ="/")
    myHdfsShareDir <- paste("/user/RevoShare",mySshUsername, sep="/")

    myHadoopCluster <- RxHadoopMR(
        hdfsShareDir = myHdfsShareDir,
        shareDir = myShareDir,
        sshUsername = mySshUsername,
        sshHostname = mySshHostname,
        sshSwitches = "-i c:\\data\\hdp.pem",
        consoleOutput = TRUE)

    cluster <- rxSetComputeContext(myHadoopCluster)

    The sshSwitches value may be used to submit other arguments as needed to the ssh client, such as a non-default ssh port.

  7. Test the R script from Revolution R Enterprise on the Windows client. The script should connect using the Cygwin ssh client in the background to submit the script for execution on the namenode.

See the RevoScaleR Hadoop Getting Started Guide for more information.

Need more help?

Want more options?

Explore subscription benefits, browse training courses, learn how to secure your device, and more.

Communities help you ask and answer questions, give feedback, and hear from experts with rich knowledge.

Was this information helpful?

What affected your experience?
By pressing submit, your feedback will be used to improve Microsoft products and services. Your IT admin will be able to collect this data. Privacy Statement.

Thank you for your feedback!

×