scrontab (Slurm crontab)¶
Traditional cron functionality has been replaced at NERSC
with the Slurm crontab tool called scrontab. This combines the same
functionality as cron with the resiliency of the batch system. Jobs are run on
a pool of login nodes, so unlike with regular cron, a single node going down won't
keep your scrontab job from running. You can also find and modify your scrontab
job on any login node.
You can edit your scrontab script with
Once you save your script, it will automatically be scheduled by the
batch system. By default, vi is the editor for scrontab, if you
desire a different editor, you can set the EDITOR environment
variable (e.g. export EDITOR=/global/common/software/nersc/bin/emacs).
You can view your existing scripts with
Example scrontab Script¶
Each script should include traditional Slurm flags like -A and
-t. Here's an example scrontab job script that will run every three
hours (note the #SCRON --open-mode=append line which will tell Slurm
to append any new output to the output file):
#SCRON -q cron
#SCRON -C cron
#SCRON -A <account>
#SCRON -t 00:30:00
#SCRON -o output-%j.out
#SCRON --open-mode=append
0 */3 * * * <full_path_to_your_script>
scrontab times are in UTC
Currently, scrontab times on Perlmutter are in UTC.
Long-Running scrontab Jobs¶
Projects often need long-running processes to manage their work at
NERSC (e.g. a listener process to facilitate external data
movement). For now we are supporting these via the workflow
QOS which allows a much longer run
time. However, jobs in this QOS may get interrupted by maintenances
or login nodes going offline. Since it's generally desirable to have
these jobs restart as soon as possible, we recommend that you set the
start up time to be fairly frequent (e.g., once an hour) and add the
singleton flag to that scrontab job's flags:
#SCRON --qos=workflow
#SCRON --account=<account>
#SCRON --time=30-00:00:00
#SCRON --dependency=singleton
#SCRON --name=my_data_movement_helper
0 * * * * <full_path_to_your_script>
This means Slurm will check every hour whether an instance of your job is running, and if not, it will start it.
Use singleton for long running jobs
You must use --dependency=singleton for long running jobs to avoid Slurm
starting multiple instances of the same job every time your
scrontab file is edited.
Monitoring Your scrontab Jobs¶
You can monitor your scrontab jobs with
This will show the next time the batch system will run your job. If
the scrontab job is set to repeat, the system will automatically
reschedule the next job. Additionally, if you modify your scrontab
job, Slurm will automatically cancel the old job and resubmit a new
one.
Canceling a scrontab job¶
To remove a scrontab job from your running jobs, you can edit the scrontab
file with scrontab -e and comment out all the lines associated with
the entry.
Using scancel on a scrontab job
The scancel command will give a warning when attempting
to remove a job started with scrontab.
perlmutter$ scancel 555
scancel: error: Kill job error on job id 555: Cannot scancel a scrontab job without the --hurry flag, or modify scrontab jobs through scontrol
scrontab job with the --hurry flag, the entry in the
scrontab file will be prepended with #DISABLED. These comments
will need to be removed before the job will be able to start again.
Using scrontab to submit other batch jobs¶
scrontab can be used to submit batch jobs at regular intervals, often as part
of a larger workflow. It is important to note that scrontab jobs set certain
Slurm-related environment variables which may be inherited by batch jobs
submitted from the scrontab job.
A notable example is that scrontab jobs set a default
SLURM_MEM_PER_CPU=2048 which can cause errors when inherited into batch jobs,
often of the form srun: error: Unable to create step for job <id>: More processors requested than permitted.
A known workaround to avoid this is to set
in the scrontab file to handle that specific environment variable,
or to use unset ${!SLURM_@}; to unset all
Slurm-related environment variables in the file.