|
||||
|
Running simulations in the databar
Page last edited by Mikkel Strange (mikst) 30/01-2018
The High-performance computing nodes When logging in on the Linux nodes in the databar, you are sharing the node with other interactive users. Running large DFT calculations directly on the command line may be slow, as you will be competing for CPU ressources with your fellow students. For large calculations, it will be beneficial to submit the calculation to the "HPC nodes". HPC means High Performance Computing. On the HPC nodes, you also have the possibility to run a job in parallel, that is on multiple CPUs. The typical work flow is
Before submitting the script: Testing it. You can test your script on the interactive nodes before you submit with the command
(that is two minus signs before dry-run). If you plan to run on four CPUs, you should write
In both cases the script will run until the first GPAW calculation begins. Instead of running the first GPAW calculation, GPAW will initialize, estimate its memory use, and check that everything is reasonable. It will begin writing to the usual .txt file (GPAW normally writes to a file with extension .txt while it runs so you can follow the progress). You can look through the .txt file to look for problems, it will for example contain information about how GPAW plans to parallelize your calculation (if you asked for that). Submitting the script. Submit the script to the HPC queue with the command gpaw-qsub. To run on a single CPU, use
To run on multiple CPUs (for example 4):
Note that small calculations will run slower or fail alltogether if you specify too many CPUs. It also increases the risk that your job is queued instead of executing immediately. Monitoring progess. You can see the queue with the command qstat (queue status). The following is an example of a job being submitted and the queue checked. Bold marks what the user types:
The letter R in the second-last column means that the job is running. Other possibilities include Q for queued and C for completed.
While your job runs, the output that would otherwise have been printed on the terminal goes into two files called myscript.py.oNNNNN and myscript.py.eNNNNN where NNNNN is the job-id number. The first contains ordinary output (in unix called standard output), the second contains error messages (in unix called standard error). So everything you print with the print statement in Python goes into the .oNNNNNN files. Unfortunately, that file is buffered, which means that output does not get written into the file immediately, but only in blocks of a few kilobytes at a time, the rest comes when the job completes. It makes it less useful to monitor the file while the job runs.
GPAW normally also writes a .txt file during the calculation. You can monitor it with the command
where the option -n 100 means "write the last 100 lines", and -f means "continue to follow the file", i.e. continue to print lines that appear in the file.
Cancel a job
If a job malfunctions, please cancel it with the qdel command.
The number is the job-id shown in the qstat command.
|
|||