The NeSI Auckland equipment (includes Pan cluster) is migrating to the new Tamaki Data Centre to allow room for expansion.
The time-line so far:
- 7 December - Friday
- 09:00 Login node closed and job queues drained
- 11:00 Power off all equipment
- 12:00 Disconnecting cables
- 17:00 Trucks move equipment
- 19:00 Equipment braced in TDC
- 20:00 Cabling commences
- 8 December - Saturday
- 17:00 Power-up of racks
- 21:30 Network functionality restored
- 9 December - Sunday
- 08:30 IB working (with only one lost node at the moment)
- 08:30 3 GPFS servers found
- 13:00 Storage looking ok
- 13:00 Temperatures holding
- 14:00 LoadLeveler testing commences
- 16:00 Benchmarks run - looking good
- 10 December - Monday
- Most of the day spent fault finding and checking functionality
- 17:00 Pan cluster available for researchers
Multithreaded and parallel job descriptions changes
Loadleveler configuration has been changed. The new configuration requires mutltithreaded and mixed jobs to be defined in a slightly different way. Now multithreaded jobs will need to have job_type set to serial and the amount of threads requested should be set via parallel_threads= directive:
Mixed jobs (those that use MPI and multiple threads per rank) should also use parallel_threads directive to reserve the number of threads per rank:
CPU affinity binding
With new configuration affinity control has become available. Affinity-based scheduling means that Loadleveler now knows what cores exactly are given to the users and these cores can be tied up to the user application. In order to use affinity control an application must be launched through smpexec program:
An additional perk of using smpexec is delivered by using -s key:
If -s is used, smpexec will print out application statistics once the application finishes. The statistics includes maximum amount of memory used and the number of threads the application spawned.
Number of cores per node is reduced
One of the effects of using affinity scheduling is the reduced number of cores per node. Now one core per node is reserved for I/O operations and as such the maximum number of allocatable cores in Westmere nodes is 11 and in Sandy Bridge nodes is 15.
llstatus output interpretation
Due to the introduction of new way to define multithreaded jobs llstatus output can now be misleading, showing very low number of tasks while jobs are being held due to the resource constraints. The reason for it is that llstatus reports only running tasks, but not allocated cores. In order to assess cores availability llstatus -R command should be used that displays the list of available/accessible cores per node.
New documentation on how to install and use the NeSI Tools software suite has been created and posted at the following link:https://wiki.auckland.ac.nz/display/CERES/NeSI+Pan+Cluster+Login
This software allows remote computers (Windows, Mac and UNIX) to login and use the Pan cluster. The suite includes MobaXterm, Grisu Template Client and Gricli. The guides show installation and uploading and downloading files to/from the Pan cluster.
On December 7 at 9 am, the NeSI Pan cluster will be shutdown and relocated to the new Tamaki Data Centre. We aim to have all services available no later than by the end of Monday December 10. The relocation will make room for additional hardware as part of the University of Otago's recent co-investment in NeSI. Over the next few weeks, an additional 63 nodes will be made available to researchers, increasing the total capacity of Pan to over 3,000 CPU cores. The expansion will also increase the number of available NVIDIA 'Tesla' M2090 GPUs to a total of 32. As part of this addition, three high memory nodes (20 CPU cores, 512 GB RAM) will also be made available to researchers.
Looking forward, the Centre for eResearch is planning another investment into the NeSI Pan cluster for 2013 Q1 and would appreciate any feedback and suggestions with regard to hardware needs. We are considering purchasing new GPUs based on NVIDIA's Kepler architecture as well as the recently released Intel Xenon Phi accelerator. These devices will primarily be of interest to researchers who develop their own code but we would also like to know about compatible applications that you would like to use if these accelerators were made available. Please send any feedback to email@example.com