Wolfram Support Quick Answers: How do I use Mathematica in a managed high-performance cluster?

Read this article in: Deutsch, Español, Français 한국어, Português, Русский

Background Information

Larger computing clusters consist of many nodes, where each node is made up of many CPU cores. Moreover, special software such as Mathematica is often available on the cluster. Users request these resources by logging in to a head node and submitting batch jobs to the cluster manager. The job executes when the resources become available. Two common cluster managers include TORQUE and Slurm.

Parallelization in Mathematica uses the hub-and-spoke model in which a controlling kernel manages a number of subordinate kernels (subkernels). In a cluster environment, the client runs the controlling kernel and the hosts provide subkernels. The cluster manager determines which node acts as the client and which nodes act as hosts.

The benefits of running Mathematica on a cluster are twofold: the number of available CPU cores—even on a single node—is usually more than on a desktop computer, and the speed of each CPU core is individually faster than those on a desktop computer.

Running a Remote Mathematica Front End

A remote front end requires the user to remain in control of the job’s resources. This is known as an interactive session.

Though there is usually a benefit in CPU speed over using a desktop computer, the front end via an interactive session is slower than running a local version of the front end. This is because the front end runs on the cluster while the interface is forwarded to the remote user’s computer.

An interactive session is not intended to run CPU-intensive calculations, but rather it is used to test and diagnose code. It is typical to request only one node’s resources.

Log in to the head node via SSH with X Windows forwarding enabled.
Launch an interactive session, requesting all of the resources on a single node.
Start a Mathematica session.

From a Mathematica notebook, using the LaunchKernels[] command, or any other parallel functionality, will now include subkernels running from the cluster.

Running a Remote Mathematica Script

It is assumed that:

You are familiar with launching remote subkernels
The cluster is a Linux environment
The cluster uses a cloned file system such that Mathematica is run by the same executable on all nodes

If any of the above assumptions are not true, then the following will need to be modified, but the general outline remains:

Query the system to find the names of the nodes that are assigned to the job
Query the system to find how many cores per node are available
Manually launch remote kernels

Slurm

For Slurm, in a Mathematica or Wolfram Language script, you could have:

(* get host names *)
hosts = ReadList["!scontrol show hostname $SLURM_JOB_NODELIST", String]
masternode = Environment["SLURM_NODENAME"]
njobs = ToExpression[Environment["SLURM_CPUS_ON_NODE"]]
hosts = DeleteCases[hosts, masternode]; Length[hosts]

(* launch kernels on each host *)
kernels =
 KernelConfiguration[#, "KernelCount" -> njobs] & /@ hosts;
LaunchKernels[kernels]

In acquiring the node names, you can also use SLURM_NODELIST instead of SLURM_JOB_NODELIST for backward compatibility.

The returned format of the number of tasks per node is not suited for Wolfram Language interpretation. Thus, the expression is manually parsed to a list expression.

TORQUE

For TORQUE, in a Mathematica or Wolfram Language script, you could have:

(* get association of resources *)
hosts = Counts[ReadList[Environment["PBS_NODEFILE"], "String"]];

(* launch subkernels and connect them to the controlling Wolfram Kernel *)
kernels=
Map[KernelConfiguration[#,"KernelCount" -> hosts[#]] &, Keys[hosts]]
LaunchKernels[kernels]

PBS_NODEFILE is a system variable that lists each node name one time per available CPU core. Counts converts this list into an instance of Association, with the node name as the key and the number of CPU cores as the value. For simplicity, all subkernels are treated as remote kernels even though some would be running on the same node as the local instance of Mathematica.

Tidying Up after the Computation

When the parallel code is complete, it is good practice to close the Wolfram kernels:

CloseKernels[];

Contact Support

Whether you have a question about billing, activation or something more technical, we are ready to help you.

1-800-WOLFRAM (+1-217-398-0700 for international callers)

Customer Support

Monday–Friday
8am–5pm US Central Time

Product registration or activation
Pre-sales information and ordering
Help with installation and first launch

Advanced Technical Support (for eligible customers)

Monday–Thursday
8am–5pm US Central Time

Friday
8:30–10am & 11am–5pm US Central Time

Priority technical support
Product assistance from Wolfram experts
Help with Wolfram Language programming
Advanced installation support

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

How do I use Mathematica in a managed high-performance cluster?

Background Information

Running a Remote Mathematica Front End

Running a Remote Mathematica Script

Slurm

TORQUE

Tidying Up after the Computation

Contact Support

1-800-WOLFRAM (+1-217-398-0700 for international callers)

Customer Support

Advanced Technical Support (for eligible customers)

Background Information

Running a Remote Mathematica Front End

Running a Remote Mathematica Script

Slurm

TORQUE

Tidying Up after the Computation

Related Articles

Contact Support

1-800-WOLFRAM (+1-217-398-0700 for international callers)

Customer Support

Advanced Technical Support (for eligible customers)