Device Selection
This section details how the solver selects which GPU devices to utilize during a simulation. The code includes logic to automatically detect available hardware, assign devices based on MPI ranks, or accept manual specifications via command-line arguments.
Device Specifiers
The device selection logic is controlled via the device_specifier string (often passed via the -dev flag). The following options are available:
| Option | Description |
|---|---|
all | Uses all detected devices on the node. |
per-rank | Assigns 1 GPU per MPI rank. Calculated as node_rank % num_devices. |
any | Equivalent to num-avail=1. Selects the first available device. |
all-avail | Uses all devices that are currently considered "available" (see Availability Logic). |
num-avail=n | Uses the first n devices that are currently available. |
list=n0,n1... | Manually selects specific device IDs (e.g., list=0,2). |
n0,n1... | Shorthand for the list= option. |
default | Context-dependent behavior. If running with MPI, defaults to per-rank. If running single-process, defaults to num-avail=1. |
Availability Logic
When using specifiers containing avail (e.g., all-avail, num-avail), the code checks the current memory usage of the GPUs to determine if they are free to use.
A GPU is considered available if it currently has less than 3.0 GB of memory allocated. If a device exceeds this threshold, the selection logic skips it to avoid out-of-memory errors on shared workstations.
Recommended Execution Modes
Generally speaking, if running interactively, you should just be able to run either champs+ (no device specifier) or champs+ -dev all (running on all available GPUs). If running on HPC, then running using the default champs+ will automatically allocated GPUs to individual MPI ranks based on the run configuration set in the job scheduler.
Otherwise, there are two primary ways to run the solver, depending on your hardware environment (local workstation vs. HPC cluster).
1. Multithreaded Mode (Workstation)
Best for: Running interactively on a single workstation with one or more GPUs.
In this mode, you run a single MPI rank (mpi_size = 1). You can control device selection manually using the options listed above.
- Single GPU: Use
-dev defaultor-dev any. - Multi-GPU: Use
-dev allto utilize all GPUs on the machine via multiple threads within a single process.
2. Distributed Mode (HPC / Cluster)
Best for: Large-scale simulations across multiple nodes or large multi-GPU nodes.
In this mode, you run multiple MPI ranks (mpi_size > 1).
- Constraint: You cannot manually specify device lists (e.g.,
list=0,1) in this mode. Doing so will throw an exception. - Behavior: The system forces the
per-ranklogic. - Logic: Each rank identifies its local index on the node and selects the GPU corresponding to
node_rank % device_count. This ensures a 1-to-1 mapping between MPI ranks and GPUs.
When running on High Performance Computing (HPC) systems, correct device visibility depends heavily on the job scheduler (e.g., Slurm, PBS) and MPI process binding.
The solver assumes that the compute environment provides the correct visible devices to the process. If you encounter issues where ranks are fighting over the same GPU or failing to detect devices:
- Check your scheduler script (e.g.,
#SBATCH --gpus-per-task). - Verify your MPI binding flags.
- Consult your system administrator.
The internal per-rank logic is designed to work with standard MPI configurations, but cannot override system-level hardware masking.