QA: We are having problems running LSF jobs on our cluster - how can we verify setup?

The following items should be checked to verify that your cluster is setup properly to run LSF jobs on all of the nodes in the cluster:
  • Passwordless ssh needs to be setup and available between all of the nodes of the cluster.
  • The shared directory that you are using needs to be visible to all nodes of the cluster.  Sometimes, mounts drop on reboots, or dns changes. It may be necessary to run 'sudo service nfs restart' on the host which exports the directory, then 'sudo mount -a' on all other nodes.
  • Ensure that you can run R on each node of your cluster.
  • Ensure that the cluster is operational and visible from your client by running the LSF commands 'bhosts' and 'lshosts' on that client.
  • Run the RevoScaleR command 'rxPingNodes()' in Revolution R to verify that all nodes are visible and operational on the cluster.
Note This is a "FAST PUBLISH" article created directly from within the Microsoft support organization. The information contained herein is provided as-is in response to emerging issues. As a result of the speed in making it available, the materials may include typographical errors and may be revised at any time without notice. See Terms of Use for other considerations.
Propiedades

Id. de artículo: 3104173 - Última revisión: 11/01/2015 11:17:00 - Revisión: 1.0

Revolution Analytics

  • KB3104173
Comentarios