Skip to main content

Device Selection

This section details how the solver selects which GPU devices to utilize during a simulation. The code includes logic to automatically detect available hardware, assign devices based on MPI ranks, or accept manual specifications via command-line arguments.

Device Specifiers

The device selection logic is controlled via the device_specifier string (often passed via the -dev flag). The following options are available:

OptionDescription
allUses all detected devices on the node.
per-rankAssigns 1 GPU per MPI rank. Calculated as node_rank % num_devices.
anyEquivalent to num-avail=1. Selects the first available device.
all-availUses all devices that are currently considered "available" (see Availability Logic).
num-avail=nUses the first n devices that are currently available.
list=n0,n1...Manually selects specific device IDs (e.g., list=0,2).
n0,n1...Shorthand for the list= option.
defaultContext-dependent behavior. If running with MPI, defaults to per-rank. If running single-process, defaults to num-avail=1.

Availability Logic

When using specifiers containing avail (e.g., all-avail, num-avail), the code checks the current memory usage of the GPUs to determine if they are free to use.

A GPU is considered available if it currently has less than 3.0 GB of memory allocated. If a device exceeds this threshold, the selection logic skips it to avoid out-of-memory errors on shared workstations.


Generally speaking, if running interactively, you should just be able to run either champs+ (no device specifier) or champs+ -dev all (running on all available GPUs). If running on HPC, then running using the default champs+ will automatically allocated GPUs to individual MPI ranks based on the run configuration set in the job scheduler.

Otherwise, there are two primary ways to run the solver, depending on your hardware environment (local workstation vs. HPC cluster).

1. Multithreaded Mode (Workstation)

Best for: Running interactively on a single workstation with one or more GPUs.

In this mode, you run a single MPI rank (mpi_size = 1). You can control device selection manually using the options listed above.

  • Single GPU: Use -dev default or -dev any.
  • Multi-GPU: Use -dev all to utilize all GPUs on the machine via multiple threads within a single process.

2. Distributed Mode (HPC / Cluster)

Best for: Large-scale simulations across multiple nodes or large multi-GPU nodes.

In this mode, you run multiple MPI ranks (mpi_size > 1).

  • Constraint: You cannot manually specify device lists (e.g., list=0,1) in this mode. Doing so will throw an exception.
  • Behavior: The system forces the per-rank logic.
  • Logic: Each rank identifies its local index on the node and selects the GPU corresponding to node_rank % device_count. This ensures a 1-to-1 mapping between MPI ranks and GPUs.
HPC Configuration Note

When running on High Performance Computing (HPC) systems, correct device visibility depends heavily on the job scheduler (e.g., Slurm, PBS) and MPI process binding.

The solver assumes that the compute environment provides the correct visible devices to the process. If you encounter issues where ranks are fighting over the same GPU or failing to detect devices:

  1. Check your scheduler script (e.g., #SBATCH --gpus-per-task).
  2. Verify your MPI binding flags.
  3. Consult your system administrator.

The internal per-rank logic is designed to work with standard MPI configurations, but cannot override system-level hardware masking.