Wiki
Kursus 10302

Running simulations in the databar
Page last edited by Mikkel Strange (mikst) 30/01-2018

IMPORTANT:

Remember to log in to a Linux node if you do not use ThinLinc to access the databar!

linuxsh -X

The High-performance computing nodes

When logging in on the Linux nodes in the databar, you are sharing the node with other interactive users.  Running large DFT calculations directly on the command line may be slow, as you will be competing for CPU ressources with your fellow students.  For large calculations, it will be beneficial to submit the calculation to the "HPC nodes".  HPC means High Performance Computing.  On the HPC nodes, you also have the possibility to run a job in parallel, that is on multiple CPUs.  

The typical work flow is

  • Prepare your Python script
  • Test that it appears to work
  • Submit your script to the HPC queue.
  • Wait for your job to complete (usually it starts immediately, but occationally it will be queued).
  • See the results.

Before submitting the script: Testing it.

You can test your script on the interactive nodes before you submit with the command

python myscript.py --gpaw dry-run=1

(that is two minus signs before dry-run).  If you plan to run on four CPUs, you should write

python myscript.py --gpaw dry-run=4

In both cases the script will run until the first GPAW calculation begins.  Instead of running the first GPAW calculation, GPAW will initialize, estimate its memory use, and check that everything is reasonable.  It will begin writing to the usual .txt file (GPAW normally writes to a file with extension .txt while it runs so you can follow the progress).  You can look through the .txt file to look for problems, it will for example contain information about how GPAW plans to parallelize your calculation (if you asked for that).

Submitting the script.

Submit the script to the HPC queue with the command gpaw-qsub.  To run on a single CPU, use

gpaw-qsub myscript.py

To run on multiple CPUs (for example 4):

gpaw-qsub -p 4 myscript.py

Note that small calculations will run slower or fail alltogether if you specify too many CPUs.  It also increases the risk that your job is queued instead of executing immediately.

Monitoring progess.

You can see the queue with the command qstat (queue status).  The following is an example of a job being submitted and the queue checked.  Bold marks what the user types:

~/tmp/test
n-62-24-13(jasc) $ gpaw-qsub -p 4 bandstructure.py
723462.hpc-fe1
~/tmp/test
n-62-24-13(jasc) $ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
723444.hpc-fe1             linuxsh          jasc            00:00:00 R app            
723462.hpc-fe1             bandstructure.py jasc                   0 R hpc   
The letter R in the second-last column means that the job is running.  Other possibilities include Q for queued and C for completed.
 
While your job runs, the output that would otherwise have been printed on the terminal goes into two files called myscript.py.oNNNNN and myscript.py.eNNNNN where NNNNN is the job-id number.  The first contains ordinary output (in unix called standard output), the second contains error messages (in unix called standard error).  So everything you print with the print statement in Python goes into the .oNNNNNN files.  Unfortunately, that file is buffered, which means that output does not get written into the file immediately, but only in blocks of a few kilobytes at a time, the rest comes when the job completes.  It makes it less useful to monitor the file while the job runs.
 
GPAW normally also writes a .txt file during the calculation.  You can monitor it with the command
tail -n 100 -f whatever.txt
where the option -n 100 means "write the last 100 lines", and -f means "continue to follow the file", i.e. continue to print lines that appear in the file.  
 
 
Cancel a job
 
 
If a job malfunctions, please cancel it with the qdel command.
 
qdel 723462
The number is the job-id shown in the qstat command.
 
 
 
Support: +45 45 25 74 43