Skip to content

Commit

Permalink
separated some material, moved data access to the second part
Browse files Browse the repository at this point in the history
updated info on HPCW/SCW
better formatting
added filezilla screenshots
  • Loading branch information
colinsauze committed Dec 5, 2017
1 parent a4ba7c0 commit a09a23a
Show file tree
Hide file tree
Showing 9 changed files with 291 additions and 29 deletions.
48 changes: 27 additions & 21 deletions lessons/0.HPC-introduction.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "HPC Introduction"
author: "Bob Freeman"
date: "Monday, August 17, 2015"
title: "SCW Introduction"
author: "Colin Sauze"
date: "November 2017"
---

## Objectives
Expand All @@ -12,6 +12,24 @@ date: "Monday, August 17, 2015"
* Be comfortable creating a batch script and submitting one
* Know how to get info about jobs and to control them

SCW background and project objectives
job efficiency metrics
How to copy files
job arrays
optimising CPU usage, GNU Parallel
using modules
installing stuff with pip

good practice
don't run jobs on login/head nodes
don't run too many jobs (25/50 limit)
don't use all the disk space
use all the cores on the node, use parallel to force this
make jobs that last at least a few minutes




## Prior Knowledge/Pre-requesites

* Basic use of the Linux command line, as covered in the Software Carpentry Introduction to the Unix Shell Lesson.
Expand All @@ -32,12 +50,14 @@ date: "Monday, August 17, 2015"
* [Managing jobs and getting job information](#managing-jobs-and-getting-job-information)
* [Best Practices](#best-Practices)
* [Distributed System Definitions and stacks:](#distributed-System-Definitions-and-stacks)
* [HPC vs. Cloud:](#hpc-vs-cloud)

* [Resources:](#resources)

## Cluster basics

Clusters, otherwise know as high-performance computing (HPC) or high-throughput computing systems, belong to a class of computing environments known as Advanced CyberInfrastructure (ACI). ACI resources also include other high-end compute systems such as distributed databases, large-scale fileystems, and software-defined networks. These tools are becoming the <em>de facto</em> standard tools in most research disciplines today.
Clusters, otherwise know as high-performance computing (HPC) or high-throughput computing systems, are large collections of relatively normal computers linked together through a "interconnect".

These tools are becoming the <em>de facto</em> standard tools in most research disciplines today.

### What are some of reasons to use a cluster?

Expand Down Expand Up @@ -65,12 +85,12 @@ HPC Wales ran from 2010 to 2015 and provided clusters in Aberystwyth, Bangor, Ca

### SCW

Super Computing Wales (SCW) is a new project to replace HPC Wales. It started in 2017 and runs for 5 years. It will include new systems in Cardiff and Swansea, but these haven't been installed yet.
Super Computing Wales (SCW) is a new project to replace HPC Wales. It started in 2017 and runs until 2020. It will include new systems in Cardiff and Swansea, but these haven't been installed yet. They are due in February 2018.

### How to get access?

Email [email protected] with completed project and account forms.
Everyone on this course should have an account already.
Everyone on this course should have a training account already, you might need a "real" account to run any serious jobs.

### Logging in

Expand Down Expand Up @@ -513,20 +533,6 @@ Fair Use/Responsibilities: https://rc.fas.harvard.edu/resources/responsibilities
* Operating system (OS): the basic software layer that allows execution and management of applications
* Physical machine: the hardware (processors, memory, disk and network)

## HPC vs. Cloud:

| HPC | Cloud |
|:----|:------|
| User account on the system | root account on the system |
| Limited control of the system | Full control of the system |
| Central shared file system | Local file system |
| Jobs submitted into a queue | Jobs executed on each resource |
| Account-based isolation | OS-based isolation |
| Batch-oriented execution of applications | support for batch or interactive applications |
| Request for resource and time allocation | Pay-per-use |
| etc. | etc.|

![HPC vs. Cloud](https://raw.githubusercontent.com/datacarpentry/cloud-genomics/master/lessons/images/HpcVsCloud.png)


## Resources:
Expand Down
105 changes: 105 additions & 0 deletions lessons/1.logging-in.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: "Logging in to SCW"
author: "Colin Sauze"
date: "November 2017"
---



### Logging in

Your username is usually `firstname.surname`. You should have been emailed details of your login prior to this workshop.


~~~
$ ssh [email protected]
~~~
{: .bash}

Windows users should use PuTTY and enter login.hpcwales.co.uk in the hostname box.


### What's available?

The `hpcwhosts` command will list the available clusters.

~~~
$ hpcwhosts
~~~
{: .bash}

~~~
HPC Wales Clusters Available
Phase System Location & Type Login Node(s)
------------------------------------------------------------------
1 Cardiff High Throughput cwl001 cwl002 cwl003
1 Bangor Medium Processing bwl001 bwl002
2 Swansea Capability/Capacity/GPU ssl001 ssl002 ssl003
2 Cardiff Capacity/GPU csl001 csl002
~~~
{: .output}


|Cluster|Number of Nodes|Cores per node|Architecture|RAM|Other|
|---|---|---|---|---|---|
|Cardiff High Throughput|162|12|Westmere|36GB||
|Cardiff High Throughput|4|2|Nehalem|128GB||
|Cardiff High Throughput|1|8|Nehalem|512GB||
|Cardiff Capacity|384|16|Sandy Bridge|64GB||
|Cardiff GPU|16|16|Sandy Bridge|64GB|Nvidia Tesla M GPU|
|Swansea Capability|16|16|Sandy Bridge|128GB||
|Swansea Capability|240|16|Sandy Bridge|64GB||
|Swansea Capacity|128|16|Sandy Bridge|64GB||
|Swansea GPU|16|16|Sandy Bridge|64GB|Nvidia Tesla M2090 (512 core, 6GB RAM)|
|Bangor|54|12|Westmere|36GB||

Total: 15520 cores, 304.7 Trillion Floating Point Operations Per Second (TFlops)


#### SCW vs HPCW

SCW is still in the process of being purchased. We are probably getting Intel Sandybridge Xeon processors. Approximately double the speed of a Sandybridge processor. Expect total speed around 700 TFLOPs. The top500 list compiles a list of the world's 500 fastest computers. The November 2017 (https://www.top500.org/list/2017/11/?page=4) list has a 700 FLOP system at position 383.


### Slurm

Slurm is the management software used on HPC Wales. It lets you submit (and monitor or cancel) jobs to the cluster and chooses where to run them.

Other clusters might run different job management software such as LSF, Sun Grid Engine or Condor, although they all operate along similar principles.


### How busy is the cluster?

The ```sinfo``` command tells us the state of the cluster. It lets us know what nodes are available, how busy they are and what state they are in.

Clusters are sometimes divided up into partitions. This might separate some nodes which are different to the others (e.g. they have more memory, GPUs or different processors).

~~~
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
work* up infinite 2 drain* bwc[022,052]
work* up infinite 1 down* bwc016
work* up infinite 13 mix bwc[001-002,010-012,031-036,050-051]
work* up infinite 38 alloc bwc[003-009,013-015,017-021,023-030,037-049,053-054]
long up infinite 2 drain* bwc[022,052]
long up infinite 1 down* bwc016
long up infinite 13 mix bwc[001-002,010-012,031-036,050-051]
long up infinite 38 alloc bwc[003-009,013-015,017-021,023-030,037-049,053-054]
~~~
{: .output}

* work* means this is the default partition.
* AVAIL tells us if the partition is available.
* TIMELIMIT tells us if there's a time limit for jobs
* NODES is the number of nodes in the this partition.
* STATE, drain means the node will become unavailable once the current job ends. down is off, allow is allocated and mix is ...



**Exercises**
* Login to the login node.
* Run hpcwhosts and pick a host to login to
* Login to that host
* run sinfo command and discuss with your neighbour what you think might be going on.
* Try using sinfo --long, does this give any more insights?

167 changes: 159 additions & 8 deletions lessons/2.moving-data.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,166 @@
# Move data onto your instance
---
title: "Filesystems and Storage"
author: "Colin Sauze"
date: "November 2017"
questions:
- "What is the difference between scratch and home?"
objectives:
- "Understand the difference between home and scratch directories"
- "Understand how to copy files between your computer and your SCW home/scratch directories"
keypoints:
- "scratch and home are per site, no common storage between sites."
- "scratch is faster and has no quotas, its not backed up. home is slower, smaller but backed up"
---

## Surprise!

You've already been working on the cloud instance and moving data from external servers onto your instance as well as moving data around your instance.
# Filesystems and Storage

Remember, when you downloaded the *E. coli* dataset to work with?
## What is a filesystem?
Storage on most compute systems is not what and where you think they are! Physical disks are bundled together into a virtual volume; this virtual volume may represent one filesystem, or may be divided up, or partitioned, into multiple filesystems. And your directories then reside within one of these fileystems. Filesystems are accessed over the network through mount points.

wget ftp://blahblahblahblah
![Filesystem definition diagram](images/filesystems-generic.png)
There are multiple storage/filesystems options available for you to do your work. The most common are:
* home: where you land when you first login. 50 GB per user. Slower access, backed up. Used to store your work long term.
* scratch: temporary working space. Faster access, not backed up. No quota, but old files might get deleted. DON'T STORE RESULTS HERE!

In this case, you are using a command line tool, *wget*, to download content from a webserver. This command supports downloading files from FTP and HTTP(s). The tool *wget* also supports recursive download (with the parameter *-r*), allowing you to download content from a directory or folder. For your information, there are other command line tools that can also be used to download data (e.g., *curl*), but *wget* should serve you well for this lesson and bioinformatic analysis.
Here's a synopsis of filesystems on HPC Wales:

## Moving files between your laptop and your instance
![Odyssey filesystems](images/filesystems-odyssey.jpg)

**Important!! Ensure that you don't store anything longer than necessary on scratch, this can negatively affect other people’s jobs on the system.**


# Accessing your filestore

## Where is my filestore located?

Both scratch and home filestore is on a per site basis. There are file servers in Bangor, Cardiff and Swansea each serving home and scratch directories to all the compute nodes. When you login to a cluster your home directory will be the home directory for that site and you won't have direct access to your files created at a different site.

## How much quota do I have left on my home directory?

Login to a head node (e.g. cwl001, bwl001, sssl001 or csl001) and run the ```myquota``` command. This will tell you how much space is left in your home directory.

~~~
$ myquota
~~~
{: .bash}

~~~
Disk quotas for group colin.sauze (gid 16782669):
Filesystem blocks quota limit grace files quota limit grace
cfsfs001-s03:/nfsshare/exports/space03
192M 51200M 53248M 2529 500k 525k
~~~
{: .output}


## How much scratch have I used?

The ```df``` command tells you how much disk space is left. The ```-h``` argument makes the output easier to read, it gives human readable units like M, G and T for Megabyte, Gigabyte and Terrabyte instead of just giving output in bytes. By default df will give us the free space on all the drives on a system, but we can just ask for the scratch drive by adding ```/scratch``` as an argument after the ```-h```.

~~~
$ df -h /scratch
~~~
{: .bash}

~~~
Filesystem Size Used Avail Use% Mounted on
mds001.hpcwales.local@tcp:mds002.hpcwales.local@tcp:/scratch
170T 149T 19T 90% /scratch
~~~
{: .output}

## Copying data from your PC to HPCW/SCW

You can copy files to/from your HPCW/SCW home and scratch drives using the secure copy protocol (SCP) or secure file transfer protocol (SFTP) and connecting to the host ```scp.hpcwales.co.uk``` or ```sftp.hpcwales.co.uk```. You will find your home and scratch directories in the following locations:

|Directory|Description|
|---|---|
|/hpcw/cf/firstname.surname/|Cardiff Home Directory|
|/hpcw/sw/firstname.surname/|Swansea Home Directory|
|/hpcw/ba/firstname.surname/|Bangor Home Directory|
|/hpcw/cf-scratch/firstname.surname|Cardiff Scratch|
|/hpcw/sw-scratch/firstname.surname|Swansea Scratch|
|/hpcw/ba-scratch/firstname.surname|Bangor Scratch|


### Copying data using the command line

Use the ```sftp``` command and connect to the system. This takes the argument of the username followed by an @ symbol and then the hostname (scp.hpcwales.co.uk). Optionally you can specify what directory to start in by putting a ```:``` symbol after this and adding the directory name. The command below will start in ```/hpcw/ba/jane.doe/```, if you don't specify the directory then the Cardiff directory is used.

~~~
sftp [email protected]:/hpcw/ba/jane.doe/
~~~
{: .bash}


~~~
Welcome to HPC Wales & Supercomputing Wales
This system is for authorised users, if you do not
have authorised access please disconnect immediately.
Password:
Connected to scp.hpcwales.co.uk.
Changing to: /hpcw/cf/jane.doe/
sftp> ls
~~~
{: .output}


The ```sftp``` and ```scp``` commands should be available on all Linux and Mac systems. On Windows systems they can be made available if you install the Linux Subsystem for Windows (Windows 10 only), the Github command line (CHECK ME).
Aberystwyth University Windows desktops already have it installed in ......


### Copying data using Filezilla

Filezilla is a graphical SCP/SFTP client available for Windows, Mac and Linux. You can download it from https://filezilla-project.org/download.php?type=client

Open filezilla and click on file menu and choose ```Site Manager```.

![Transferring files using FileZilla](images/filezilla1.png)

A new site will appear under "My Sites". Name this site "HPC Wales" by clicking on Rename. Then enter "scp.hpcwales.co.uk" as the Host, your username as the user name and choose "Ask for password" as the logon type. Then click Connect. You should now be prompted for your password, go ahead and enter your HPC Wales password and click Ok.

![Transferring files using FileZilla](images/filezilla2.png)

You should now have some files in the right hand side of the window. These are on the remote system, the list on the left hand side is your local system.

![Transferring files using FileZilla](images/filezilla3.png)

Files can be transferred either by dragging and dropping them from one side to the other. Or you can right click on a remote file and choose "Download" or a local file and choose "Upload".

![Transferring files using FileZilla](images/filezilla4.png)
![Transferring files using FileZilla](images/filezilla5.png)

You can change directory on the remote host by typing a path into the "Remote site:" box. For example type in ```/hpcw/sw/user.name``` (where user.name is your username) to access your Swansea home directory.

![Transferring files using FileZilla](images/filezilla6.png)



**Exercises**

> ## Using the `df` command.
> 1. Login to Cardiff head node (`ssh cwl001` or `ssh cwl002` or `ssh cwl003`)
> 2. Run the command `df -h`.
> 3. How much space does /scratch have left?
> 4. Logout from the Cardiff cluster by typing `exit` and login to the Swansea head node (ssl001, ssl002 or ssl003).
> 5. Run `df -h` again, how much space to /scratch in Swansea have left?
> 6. If you had to run a large job requiring 10TB of scratch space, where would you run it?
{: .challenge}

> ## Using the `myquota` command.
> 1. Login to a system of your choice (try cwl001, bwl001 or ssl001)
> 2. Run the `myquota` command.
> 3. How much space have you used and how much do you have left?
> 4. If you had a job that resulted in 60GB of files would you have enough space to store them?
> 5. Try a different system and compare the amount of free space.
{: .challenge}

> ## Copying files.
> 1. Login to the Bangor system, by typing `ssh bwl001`.
> 2. Create a file called hello.txt by using the nano text editor (or the editor of your choice) and typing `nano hello.txt`. Enter some text into the file and press Ctrl+X to save it.
> 3. Use either Filezilla or SCP/SFTP to copy the file to your computer. The file will be in /hpcw/ba/user.name/hello.txt.
> 4. Create a file on your computer using a text editor. Copy that file to your Bangor home directory using Filezilla or SCP/SFTP and examine its conents with nano on the Bangor system.
If you're interested in transferring files on your computer to your instance, you can follow these operating-specific [instructions](http://angus.readthedocs.org/en/2014/amazon/transfer-files-between-instance.html).
Binary file added lessons/images/filezilla1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added lessons/images/filezilla2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added lessons/images/filezilla3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added lessons/images/filezilla4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added lessons/images/filezilla5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added lessons/images/filezilla6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a09a23a

Please sign in to comment.