Base compilers on NERSC systems¶
Introduction¶
There are several options for compilers that can be used on NERSC compute systems. Some of the compilers are open-source products, while others are commercial. These compilers may have different features, optimize some codes better than others, and/or support different architectures or standards. It is up to the user to decide which compiler is best for their particular application.
These base compilers are loaded into the user environment via the programming environment modules. They can then be invoked through compiler wrappers (recommended) or on their own. All compilers on NERSC machines are able to compile codes written in C, C++, or Fortran, and provide support for OpenMP.
There are several vendor-provided base compilers available on Perlmutter, with varying levels of support for GPU code generation: Cray, GNU, AOCC (AMD Optimizing C/C++ Compiler), and NVIDIA. NERSC also provides LLVM compilers on Perlmutter.
LLVM compilers not compatible with all vendor software
The LLVM compilers are not supported by HPE Cray and therefore are not compatible with all of the same software and libraries that the vendor-provided compiler suites are, but may nevertheless be useful for users who require an open-source LLVM-based compiler toolchain.
Below is a table listing the available compilers on Perlmutter, with the default compilers indicated.
| Compilers | Perlmutter |
|---|---|
| Intel | ✓ |
| GNU | ✓ (Default) |
| Cray | ✓ |
| NVIDIA | ✓ |
| AOCC | ✓ |
| LLVM | ✓ (Provided by NERSC) |
All vendor-supplied compilers are provided via the "programming
environments" that are accessed via the module utility. Each
programming environment contains the full set of compatible compilers
and libraries. To change from one compiler suite to another, you change
the programming environment via the module swap command. For
example, the following command changes from the GNU programming
environment to the Cray environment. Since Perlmutter uses
Lmod, loading rather than explicit
swapping works there as well.
Programming environment for using GPUs on Perlmutter¶
To compile a CUDA source code in any of the supported programming
environments, the cudatoolkit module is required to make the
CUDA Toolkit accessible. The toolkit includes GPU-accelerated
libraries, debugging and optimization tools, a C/C++ compiler, and
a runtime library to build and deploy your application. For information
about the CUDA Toolkit, see the
documentation. Note
that this module is not loaded by default.
To set the NVIDIA GPUs as the OpenMP and OpenACC offloading target
while using the Cray compiler wrappers, use the
compiler flag -target-accel=nvidia80 or set the environment
variable CRAY_ACCEL_TARGET to nvidia80. To set the acceleration
target to host CPUs instead, use the -target-accel=host flag, set
the environment variable to host, or load the craype-accel-host
module.
Do not use base compilers' target flag with the Cray compiler wrappers
The base compiler's target flag (e.g., NVIDIA's -target=gpu)
will not work with the Cray compiler wrappers.
Using compatible gcc for CUDA compiler drivers with PrgEnv-gnu¶
When using the PrgEnv-gnu environment in conjunction with the
cudatoolkit module (i.e., if compiling any application for both host
and device side), one must note that not every version of gcc is
compatible with every version of nvcc.
Older versions of the cudatoolkit may not support the default GCC
compiler (see document outlining supported host compilers for each
nvcc installation). For
older versions, one can use the cpe-cuda module available on the
system to automatically downgrade the gcc version or manually load
the version of GCC that is supported by the older cudatoolkit.
If using the cpe-cuda module, it must be loaded after loading the PrgEnv-gnu:
Compilers¶
Intel¶
The Intel compiler suite is available via the PrgEnv-intel module,
which will load the intel module for Intel base compilers.
The base compilers in this suite are:
- C:
icc - C++:
icpc - Fortran:
ifort
See the full documentation of the Intel
compilers.
Additionally, compiler documentation is provided through man pages
(e.g., man icpc) and through the -help flag to each compiler
(e.g., ifort -help).
OpenMP and OpenACC¶
To enable OpenMP, use the -qopenmp flag.
The Intel compilers do not support OpenACC.
GNU¶
The GCC compiler suite is available via the PrgEnv-gnu module,
which will load the gcc module for the GNU base compilers.
The base compilers in this suite are:
- C:
gcc - C++:
g++ - Fortran:
gfortran
See the full documentation of the GCC
compilers. Additionally, compiler
documentation is provided through man pages (e.g., man g++) and
through the --help flag to each compiler (e.g., gfortran --help).
Backward Compatibility¶
For backward compatibility, the following tips may be helpful for compiling older codes (that worked on Cori) with the newer GCC compiler versions on Perlmutter:
- Fortran: Try
-fallow-argument-mismatchfirst, followed by the more extensive flag-std=legacyto reduce strictness. - C/C++: Look for flags that reduce strictness, such as
-fpermissive. - C/C++:
-Wpedanticcan warn you about lines that break code standards.
OpenMP and OpenACC¶
To enable OpenMP for CPU code, use the -fopenmp flag.
OpenMP/OpenACC offloading to GPUs not supported yet
Offloading to GPUs with OpenMP/OpenACC is not supported in the
PrgEnv-gnu environment on Perlmutter at the moment. The
offloading-related information below is for future references
only, and can be updated.
GCC has support for OpenMP and OpenACC offloading to GPUs. OpenMP
offloading with gcc looks something like:
where -misa=sm_80 is for the NVIDIA A100 GPU. The extra compile
flags of -Ofast -lm are passed for building a binary for the
architecture.
Note that, if the Cray compiler wrapper, cc, is
used instead, use the -target-accel=nvidia80 flag instead.
OpenMP/OpenACC GPU offload support in GCC is limited
The GCC compiler's offload capabilities for GPU code generation may be limited, in terms of both functionality and performance. Users are advised to try different compilers for C/C++ codes, which also includes a Fortran compiler with OpenMP offload capability.
Mixture of C/C++/Fortran and CUDA codes¶
The programming environment supports a mixture of C/C++/Fortran and CUDA codes. CUDA and CPU codes should be in separate files, and Cray compiler wrapper commands must be used at link time:
Compatibility between nvcc host compiler and gcc compiler
To make the above work, make sure your compiler is compatible by viewing the host compiler support policy for cudatoolkit 12.4.
Cray¶
The HPE Cray compiler suite is available via the PrgEnv-cray
module, which will load the cce module for the Cray base
compilers. The base compilers in this suite are:
- C:
cc - C++:
CC - Fortran:
ftn
Full documentation of the Cray compilers is provided in the
HPE Cray Clang C and C++ Quick
Reference
for the C/C++ compilers, and the HPE Cray Fortran Reference
Manual
for the Fortran compiler. Additionally, compiler documentation is
provided through man pages (e.g., man clang or man crayftn)
or the help page (cc -help, etc.) and users may wish to read the
online Cray Compiler Environment
documentation.
Cray base compilers and Cray compiler wrappers are not the same
It is easy to confuse the Cray base compilers and the compiler wrappers
that wrap all compilers, since their names are identical. The underlying
compiler that is currently loaded is based on the programming environment
that has been loaded; for example, if PrgEnv-gnu has been loaded, then
invoking cc ultimately invokes gcc, not the Cray C compiler.
Major changes to Cray compilers starting in version 9.0
Version 8.7.9 of the Cray compiler (CCE) is the last version based on the old compiler environment and default settings. Starting in version 9.0, Cray made major changes to the C/C++ compilers, and smaller changes to the Fortran compiler. In particular:
- The C/C++ compilers have been replaced with LLVM and clang, with some additional Cray enhancements. This means that nearly all of the compiler flags have changed, and some capabilities available in CCE 8 and previous versions are no longer available in CCE 9. It may also result in performance differences in code generated using CCE 8 vs CCE 9, due to the two versions using different optimizers.
- OpenMP has been disabled by default in the C, C++, and
Fortran compilers. This behavior is more consistent with
other compilers. To enable OpenMP, one can use the following
flags:
- C/C++:
-fopenmp - Fortran:
-h omp
- C/C++:
Cray provides a migration guide for users switching from CCE 8 to CCE 9.
For users who are unable to migrate their workflows to the
clang/LLVM-based CCE 9 C/C++ compilers, Cray has simultaneously
released a CCE 9 "classic" version, which continues to use the same
compiler technology in CCE 8 and older versions. This version of
CCE is available as the module cce/<version>-classic. However,
users should be aware that "classic" CCE is now considered "legacy,"
and that all future versions of CCE are based on clang/LLVM. See
the the Cray Classic C and C++ Reference
Manual.
OpenMP and OpenACC¶
To enable OpenMP for CPU code, use the -fopenmp flag.
OpenMP/OpenACC offloading to GPUs not supported yet
Offloading to GPUs with OpenMP/OpenACC is not supported in the
PrgEnv-cray environment on Perlmutter at the moment. The
offloading-related information below is for future references
only, and can be updated.
The Cray compilers have a mature OpenMP offloading implementation.
Compiling codes using OpenMP offload capabilities on Perlmutter requires different flags for C and C++ codes than for Fortran codes. The C and C++ compilers are based on clang, and thus use similar flags that one would use for clang to generate OpenMP offload code:
cc -fopenmp -target-accel=nvidia80 -o my_openmp_code.ex my_openmp_code.c
CC -fopenmp -target-accel=nvidia80 -o my_openmp_code.ex my_openmp_code.cpp
For Fortran codes, the flag is different, and the environment
variable CRAY_ACCEL_TARGET must be set to nvidia80 at compile
time, or use the `-target-accel=nvidia80 compiler flag. Then, build
as follows:
Only the Fortran compiler supports OpenACC.
The compiler flag for enabling OpenACC in Fortran codes is -h acc.
To offload to GPUs, use the -target-accel=nvidia80 compiler flag,
or set the CRAY_ACCEL_TARGET environment variable to nvidia80.
Explicitly set the target to host CPUs when compiling OpenMP/OpenACC code for the host on Perlmutter
Due to an issue with the PrgEnv-cray compiler wrappers,
you must add -target-accel=host compiler option or load
the craype-accel-host module in order to successfully
compile any OpenMP/OpenACC code for the host.
Mixture of C/C++/Fortran and CUDA codes¶
The programming environment allows a mixture of C/C++/Fortran and CUDA codes. In this case CUDA and CPU codes should be in separate files. Cray compiler wrapper commands must be used at link time, and CUDA runtime must be included:
NVIDIA¶
The NVIDIA compiler suite is available via the PrgEnv-nvidia
module, which will load the nvidia module for the NVIDIA base
compilers. The base compilers in this suite are:
- CUDA compiler drivers
- CUDA C/C++:
nvcc - CUDA Fortran:
nvfortran
- CUDA C/C++:
- HPC compilers: for host multithreading and GPU offloading with
OpenMP, OpenACC, C++17 Parallel Algorithms and Fortran's
DO-CONCURRENT; part of the NVIDIA HPC SDK:- C:
nvc - C++:
nvc++ - Fortran:
nvfortran
- C:
The CUDA compiler drivers are used to compile CUDA codes. Below
is to compile a hello-world CUDA code, helloworld.cu, to generate
an executable helloworld:
$ cat helloworld.cu
#include <stdio.h>
__global__ void helloworld() {
printf("Hello, World!\n");
}
int main() {
helloworld<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
$ nvcc -o helloworld helloworld.cu
Note
If you see a warning message about executable stacks like below:
/usr/bin/ld: warning: /tmp/pgcudafatvEZ0-TH7-jvs.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
-Wl,-znoexecstack
or -Wl,--no-warn-execstack flag to the link command.
OpenMP, OpenACC and CUDA¶
If OpenMP and CUDA code coexist in the same program, the OpenMP
runtime and the CUDA runtime use the same CUDA context on each GPU.
To enable this coexistence, use the compilation and linking option
-cuda, as shown below.
$ cat cuda_interop.cpp # offload code calling a function in a CUDA code
...
#pragma omp target data map(from:array2D[0:M][0:N])
{
...
#pragma omp target data use_device_ptr(p)
{
add_i_slice(p, i, N);
}
...
}
...
$ cat interop_kernel.cu # CUDA code where the called function is defined
...
__global__ void add_kernel(int *slice, int t, int n)
{
...
}
void add_i_slice(int *slice, int i, int n)
{
add_kernel<<<n/128, 128>>>(slice, i, n);
}
...
$ nvc++ -Minfo -mp -target=gpu -c cuda_interop.cpp
$ nvcc -c interop_kernel.cu
$ nvc++ -mp -target=gpu -cuda interop_kernel.o cuda_interop.o
where -mp is to enable OpenMP and -target=gpu is to offload the
OpenMP construct to GPUs.
Note that, in the above non-MPI code example, the HPC compiler
nvc++ is used, but the Cray compiler wrapper, CC,
can be used instead. In that case, drop the -target=gpu flag from
the CC commands as the offload target is correctly set by the
craype-accel-nvidia80 module. MPI codes must be compiled with
the Cray compiler wrapper if Cray MPI is to be used.
The HPC compilers support OpenMP and OpenACC offloading. Invoking OpenACC in the HPC compilers, for example, looks like:
or
where the flag -acc is to enable OpenACC for GPU execution only,
and -Minfo=acc prints diagnostic information to STDERR regarding
whether the compiler was able to produce GPU code successfully.
Note that, when the HPE Cray compiler wrappers are used, replace
the -target=gpu flag with -target-accel=nvidia80.
C++17 introduced parallel STL algorithms ("pSTL"), such that standard
C++ code can express parallelism when using many of the STL algorithms.
The NVIDIA HPC compilers supports GPU-accelerated pSTL algorithms,
which can be activated by invoking nvc++ with the flag -stdpar=gpu.
See the documentation regarding pSTL for the HPC
SDK.
GPU acceleration of Fortran's DO CONCURRENT is enabled also with
the -stdpar option. If the flag is specified, the compiler does
the parallelization of the DO CONCURRENT loops and offloads them
to the GPU. All data movement between host memory and GPU device
memory is performed implicitly and automatically under the control
of CUDA Unified Memory. It is also possible to target a multi-core
CPU with -stdpar=multicore. For more info, check the NVIDIA blog,
Fortran Standard
Parallelism.
The NVIDIA HPC SDK provides cuTENSOR extensions so that some Fortran
intrinsic math functions can be accelerated on GPUs. Accelerated
functions include MATMUL, TRANSPOSE, and several others. The
nvfortran compile provides access to these GPU-accelerated functions
via the module cutensorEx. See the documentation about the
cutensorEx module in
nvfortran.
CUDA Math libraries (cuBLAS, cuFFT, cuFFTW, cuSOLVER, etc.) can be
linked easily by specifying the name of the library with the
-cudalib flag:
Note again that, when the HPE Cray compiler wrapper ftn is used,
replace the -target=gpu flag with -target-accel=nvidia80.
Full documentation of the NVIDIA compilers can be found in the NVIDIA HPC Compilers, User's Guide and the CUDA C++ Programming Guide.
Please check the NVIDIA HPC SDK - OpenMP Target Offload Training, December 2020 for useful information on the HPC compilers.
AOCC¶
The AOCC (AMD Optimizing C/C++ Compiler) compiler suite is based
on LLVM and includes many optimizations for the AMD processors. It
supports Flang as the Fortran front-end compiler. The AOCC suite
is available via the PrgEnv-aocc module, which will load the
aocc module for the AOCC base compilers. The base compilers
in this suite are:
- C:
clang - C++:
clang++ - Fortran:
flang
Full documentation of the AOCC compilers is provided at AOCC webpage, where you can find user manuals and a quick reference guide: AOCC User Guide, Clang – the C, C++ Compiler, Flang – the Fortran Compiler and Compiler Options Reference Guide for AMD EPYC 7xx3 Series Processors.
OpenMP and OpenACC¶
The compilers can generate the OpenMP parallel code for the host
CPU only, and do not support offloading to NVIDIA GPUs. To enable
OpenMP, add the compiler flag -fopenmp for C and C++ and -mp
for Fortran:
clang -fopenmp -o my_openmp_code.ex my_openmp_code.c
clang++ -fopenmp -o my_openmp_code.ex my_openmp_code.cpp
flang -mp -o my_openmp_code.ex my_openmp_code.f90
When using the HPE Cray compiler wrappers, add the target flag
-target-accel=nvidia80 for offloading to GPUs.
OpenACC is not supported.
Mixture of C/C++/Fortran and CUDA codes¶
The programming environment allows a mixture of C/C++/Fortran and CUDA codes. In this case CUDA and CPU codes should be in separate files. Cray compiler wrapper commands must be used at link time, and CUDA runtime must be included:
LLVM¶
The LLVM core libraries along with the compilers are locally built by NERSC, not HPE Cray. It is compiled against the GCC compiler suite and thus cannot be used with the Intel or HPE Cray programming environments.
The LLVM/clang compiler is a valid CUDA compiler. One can
replace NVIDIA's nvcc command with clang --cuda-gpu-arch=<arch>,
where <arch> is sm_80. If using clang as
a CUDA compiler, one usually will also need to add the
-I/path/to/cuda/include and -L/path/to/cuda/lib64 flags manually,
since nvcc includes them implicitly.
For documentation of the LLVM compilers, see LLVM,
Clang, and
Flang websites.
Additionally, compiler documentation is provided through man pages
(e.g., man clang) and through the -help flag to each compiler
(e.g., clang -help).
Note
When using LLVM Flang in a CMake project, CMake version 3.28.0 or above is required to correctly identify the compiler. Please check the version of CMake you are using if you are facing issues when building a CMake project with LLVM Flang as the Fortran compiler.
Common compiler options¶
Below is a table documenting common flags for each of the compilers.
| Intel | GNU | Cray | NVIDIA | AOCC | LLVM | comment | |
|---|---|---|---|---|---|---|---|
| Overall optimization | -O<n>, -Ofast |
-O<n>, -Ofast |
-O<n>, -Ofast |
-O<n> |
-O<n>, -Ofast |
Replace <n> with 1, 2, 3, etc. |
|
| Enable OpenMP | -qopenmp |
-fopenmp |
-fopenmp for C/C++ with CCE 9.0 or later; -h omp, otherwise |
-mp[=multicore*|[no]align] *: default |
C/C++: -fopenmp; Fortran: -mp |
-fopenmp |
OpenMP enabled by default in Cray. |
| Enable OpenMP-offload | N/A | N/A | -fopenmp for C/C++ with CCE 9.0 or later; -h omp, otherwise |
-mp=gpu |
N/A | -fopenmp -fopenmp-targets=nvpts64 |
OpenACC not supported by clang/clang++. |
| Enable OpenACC | N/A | -fopenacc |
Fortran: -h acc |
-acc |
N/A | N/A | OpenACC not supported by clang/clang++. |
| Free-form Fortran | -free |
-ffree-form |
-f free |
-Mfree |
-Mfreeform |
Also determined by file suffix (.f, .F, .f90, etc.) |
|
| Fixed-form Fortran | -fixed |
-ffixed-form |
-f fixed |
-Mfixed |
-Mfixed |
Also determined by file suffix (.f, .F, .f90, etc.) |
|
| Debug symbols | -g |
-g |
N/A | HPC compilers: -g, -gopt; CUDA: -g (or --debug) for host code and -G (or --device-debug) for device code |
-g |
-g |
Debug symbols enabled by default in Cray. |