Cray system

From Hanlon Financial Systems Lab Web Encyclopedia
Jump to: navigation, search

Cray System

The ISTeC Cray HPC System is a model XE6 supercomputer. An overview of the Cray XE6 system architecture is shown below.

Cray.png

The Cray CPU’s are 16-core AMD interlagos processors, two of which comprise a single computer node; thus, a compute node contains 32 CPU cores. There are two primary partitions – a service partition and a compute partition. The service partition includes a login node and several system administration nodes. The compute partition includes compute nodes and a high-speed interconnect network. The compute nodes and services node access 32 TB of RAID disk space through a high-speed Fibre file system. The remaining elements in the architecture diagram are primarily used for system administration. There are 84 total nodes (2,688 cores) on the Cray. These are split into 4 interactive compute nodes (128 cores) and 80 batch compute nodes (2,560 cores). Small, interactive jobs can be run on the interactive compute nodes, but large, long-running jobs should be run on the batch compute nodes. Users login directly to the service node, which in turn accesses the compute nodes through a high-speed Gemini network. Users cannot login to the compute nodes, but must submit jobs to them from a login node using the aprun command. When users access the login node, they use a full version of SUSE Enterprise Linux, but a stripped-down, more efficient Cray Linux Environment (CLE) operating system based on SUSE Enterprise Linux version 11.0 x86_64 is used on the compute nodes.

The Cray XE6 is designed to run large, parallel jobs efficiently. One nuance of this is that an entire node, consisting of 32 cores, is the smallest resource that can be used on any node in the compute partition. Stated differently, only a single job can be run at a time on any node, consisting of 32 cores. Ergo, running a scalar job on a single core may result in ‘wasting’ the other 31 cores on the node. Users are thus instructed to parallelize their jobs across at least 32 cores. However, there are times when the memory architecture represents a bottleneck in performance, and this represents an exception to the adage to use all of the cores on a node. In these cases, it may be advantageous to use multiple nodes, each with fewer than 32 cores in use, so as to make effective use of the memory bandwidth by scaling your job up across multiple, partially used nodes. But still the code should be run in parallel.

User Guide

The ISTeC Cray website is: http://istec.colostate.edu/.
The website is updated periodically with new information about the Cray.

File:Cray User Guide.pdf