Skip to content

Kkmattil sdsi #2637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
109 changes: 109 additions & 0 deletions docs/data/sensitive-data/sd-connect-and-a-commands.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Using SD Connect service with a-commands

SD Connect is part of the CSC sensitive data services that provide free-of-charge sensitive data processing environment for
academic research projects at Finnish universities and research institutes. SD Connect adds an automatic encryption layer to the Allas object storage system of CSC, so that it can be used for securely storing sensitive data. Data stored to SD Connect can also be accessed for SD Desktop secure virtual desktops.

In most cases SD Connect is used through the [SD Connect Web interface](https://sd-connect.csc.fi), but in some cases command line tools
provide more efficient way to manage data in SD Connect.

In this document we describe how you can use use the a-commands provided by [allas-cli-utils](https://github.com/CSCfi/allas-cli-utils) to upload and download data from SD Connect. These tools are available in CSC supercomputers (Puhti, Mahti and Lumi) and they can be installed in local Linux and Mac machines too.

Note, that Allas itself does not separate data stored with SD connect from other data stored in
Allas. Data buckets can contain a mixture of SD Connect data, other encrypted data and normal data
and it is up to the user to know the type of the data. However, it is probably a good idea to keep SD Connect data
in buckets and folders that don't contain other types of data.


## Opening connection to SD Connect

To open SD Connect compatible Allas connection you must add option *--sdc* the configurtion command. In CSC supercomputers the connecton is opened with commands:

```test
module load allas
allas-conf --sdc
```
In local installations the connection is typically opened with commands like

```
export PATH=/some-local-path/allas-cli-utils:$PATH
source /some-local-path/allas-cli-utils/allas_conf -u your-csc-account --sdc
```

The set up process asks first your CSC passwords (Haka or Virtu passwords can't be used here).
After that you will select the CSC project to be used. This is the normal login process for Allas.
However, when SD Connect is enabled, the process asks you to give the *SD Connect API token*. This
token must be retrieved from the [SD Connect web interface](https://sd-connect.csc.fi). Note that the tokens
are project specific. Make sure you have selected the same SD Connect project in both command line and in web
interface.

In the web interface the token can be created using dialog that opens by selecting *Create API tokens* from the *Support* menu.

Copy the token. paste it to command line and press enter.

The SD Connect compatible Allas connection is now valid for next eight hours. And you can use commands like
*a-list* and *a-delete* to manage both normal Allas objects and SD Connect objects.


## Data upload

Data can be uploaded to SD Connect by using command *a-put* with option *--sdc*.
For example to upload file *my-secret-table.csv" to location *2000123-sens/dataset2* in Allas use command:

```text
a-put --sdc my-secret-table.csv -b 2000123-sens/dataset2
```

This will produce SD Connect object: 2000123-sens/dataset2/my-secret-table.csv.c4gh

All other a-put options and features can be used too. For example directories are
stored as tar files, if --asis option is not used.

Command:

```text
a-put --sdc my-secret-directory -b 2000123-sens/dataset2
```

Will produce SD connect object: 2000123-sens/dataset2/my-secret-directory.tar.c4gh

For massive data uploads, you can use *allas-dir-to-bucket* in combination with option *--sdc*.

```text
allas-dir-to-bucket --sdc my-secret-directory 2000123-new-sens
```

The command above will copy all the files from directory my-secret-directory to bucket 2000123-new-sens in SD Connect compatible format.


## Data download

Data can be downloaded form Allas with command a-get. If SD Connect connection is enabled, a-get will automatically try to decrypt objects with suffix *.c4gh*.

So for example command:

```text
a-get 2000123-sens/dataset2/my-secret-table.csv.c4gh
```

Will produce local file: my-secret-table.csv

And similarly command:

```text
a-get 2000123-sens/dataset2/my-secret-directory.tar.c4gh
```

Will produce local directory: my-secret-directory

Note that this automatic decryptions works only for the files that have
been stored using the new SD Connect that was taken in use in October 2024.

For the older SD Connect files and other Crypt4gh encrypted files you still must
provide the matching secret key with option *--sk*

```
a-get --sk my-key.sec 2000123-sens/old-date/sample1.txt.c4gh
```

Unfortunately there is no easy way to know, which encryption method has been used in
a .c4gh file stored in Allas.
111 changes: 111 additions & 0 deletions docs/data/sensitive-data/sd-connect-sharing-for-import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Using SD Connect to receive sensitive research data

This document provides instructions of how a research group can use SD Connect to receive **sensitive data** from external
data provider like a sequencing center. The procedure presented here is applicable in cases where the data will analyzed in
SD Desktop or in a computer that has internet connection.

In some sensitive data environments internet connection is not available. In those cases, please check the alternative
approach, defined in:

* [Using Allas to receive sensitive research data](./sequencing_center_tutorial.md)


## SD Connect

SD Connect is part of the CSC sensitive data services that provide free-of-charge sensitive data processing environment for
academic research projects at Finnish universities and research institutes. SD Connect adds an automatic encryption layer to the Allas object storage system of CSC, so that it can be used for securely storing sensitive data. SD Connect can be used for storing any kind of sensitive research data during the active working phase of a research project.
SD Connect is however not intended for data archiving. You must remove your data from SD Connect when the research project ends.

There is no automatic backup processes in SD Connect. In technical level SD Connect is very reliable and fault-tolerant,
but if you, or some of your project members, remove or overwrite some data in SD Connect,
it is permanently lost. Thus, you might consider making a backup copy of your data to some other location.

Please check the [SD Connect documentation](./sd_connect.md) for more details about SD Connect.


## 1. Obtaining a storage space in SD Connect

If you are already using SD Connect service, you can skip this chapter and start from chapter 2.
Otherwise, do following steps to get access to SD Connect.


### 1.1. Create a user account

If you are not yet CSC customer, register yourself to CSC. You can do these steps in the
CSC’s customer portal [MyCSC](https://my.csc.fi).

Create a CSC account by logging in to MyCSC with Haka or Virtu. Remember to activate multi factor
authentication for your CSC account in order to be able to use SD Connect-


### 1.2. Create or join a project

In addition to CSC user account, users must either join an existing CSC computing project
or set up a new computing project. You can use the same project to access other
CSC services too like SD Desktop, Puhti, or Allas.

If you are eligible to act as a [project manager](https://research.csc.fi/prerequisites-for-a-project-manager), you can create a new CSC project in MyCSC and apply access to SD Connect.
Select 'Academic' as the project type. As a project manager, you can invite other users as members to your project.

If you wish to be joined to an existing project, please ask the project manager to add your CSC user account to the
project member list.

### 1.3. Add SD Connect access for your project

Add _SD Connect_ service to your project in MyCSC. Only the project manager can add services.
After you have added SD Connect, to the project, the other project members need to login to
MyCSC and approve the terms of use for the service before getting access to SD Connect.

After these steps, your project has 10 TB storage space available in SD Connect.
Please [contact CSC Service Desk](../../support/contact.md) if you need more storage space.


## 2. Creating a shared folder

### 2.1. Creating a new root folder in SD Connect

Once the service is enabled, you can login to [SD Connect interface](https://sd-connect.csc.fi).
After connecting, check that **Current project** setting refers to the CSC project
that you want to use. After that you can click the **Create folder** button to
create a new folder to be shared with the data provider.

Avoid using spaces (use _ instead) and special characters in the folder names as they may cause problems in some cases.
Further, add some project specific feature, like project acronym, to the name, as the root folder needs to have an unique name
among all root folders of all SD Connect and Allas projects.

### 2.2 Sharing the folder

For sharing you need to know the _Sharing ID_ string of the data producer. You should request this 32 characters long
random string form the data producer by email.

Do to the sharing, go to the folder list in SD Connect and press the share icon of the folder you wish to share.
Then copy the project ID to the first field of the sharing tool and select **Collaborate** as the sharing permission type.

Now sharing is done and you can send the name of the shared bucket to the data producer by email.


### 2.3 Revoke bucket sharing after data transport

Moving large datasets (several terabytes) of data to SD Connect can take a long time.
Once the producer tells that all data has been imported to the shared folder in Allas, you remove the external
access rights in SD Connect interface. Click the _share_ icon of the shared
folder and press **Delete** next to the project ID of the data producer.


## 3. Using encrypted data

By default data stored to SD Connect is accessible only to the members of the CSC project. However project members can
share the folder to other CSC projects.

The project members can download the data to their own computers using the SD Connect WWW interface
that automatically decrypts the data after downloading.

The data can be accessed in [SD Desktop](https://sd-desktop.csc.fi) too, using the _Data Gateway_
tool.

In Linux and Mac computers, you can install a local copy of _allas-cli-utils_ tools that provides command line
tools to download (_a-get_) and upload ( a-put --sdc ) data from and to SD Connect.

* [Using SD Connect data with a-commands](sd-connect-and-a-commands.md)


4 changes: 4 additions & 0 deletions docs/data/sensitive-data/sd-desktop-working.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,3 +133,7 @@ Read next:
- [How to import data for analysis in your desktop](./sd-desktop-access.md)
- [Customisation: adding software](./sd-desktop-software.md)
- [How to manage your virtual desktop (delete, pause, detach volume etc.)](./sd-desktop-manage.md)

## Submitting jobs from SD Desktop to HPC environments

- [How to use sdsi-client to submit batch jobs from SD Desktop to Puhti](./tutorials/sdsi.md)
18 changes: 10 additions & 8 deletions docs/data/sensitive-data/sequencing_center_tutorial.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Using Allas storage service to receive sensitive research data

This document provides an example of how a research group can use Allas service to receive **sensitive data** from external
data provider like a sequencing center. In many cases [SD Connect](sd-connect-sharing-for-import.md), provides you a more easy way to receive sensitive data but in some cases, SD Connect can't be used. For example, SD Connect is not able to provide you an encrypted file that you could later on decrypt in an environment that does not have internet connection.

## Allas

Allas storage service is a general purpose data storage service maintained by CSC.
It provides free-of-charge storage space for academic research projects at Finnish universities and research institutes.
Expand All @@ -10,9 +14,6 @@ There is no automatic backup processes in Allas. In technical level Allas is ver
but if you, or some of your project members, remove or overwrite some data in Allas,
it is permanently lost. Thus, you might consider making a backup copy of your data to some other location.

This document provides an example of how a research group can use Allas service to receive **sensitive data** from external
data provider like a sequencing center.

The steps 1 (Obtaining storage space in Allas), and 2 (Generating encryption keys) require some work,
but they need to be done only once. Once you have the keys in place you can move directly to step 3 when you
need to prepare a new shared bucket.
Expand All @@ -34,17 +35,18 @@ Create a CSC account by logging in to MyCSC with Haka or Virtu.

### Step 1.2. Create or join a project

In addition to CSC user account, new users must either join a CSC computing project

In addition to CSC user account, users must either join an existing CSC computing project
or set up a new computing project. You can use the same project to access other
CSC services too like Puhti, cPouta, or SD desktop.
CSC services too like SD Desktop, SD Connect pt Puhti.

Create a CSC project in MyCSC and apply access to Allas. See if you are eligible to act as a project manager.
If your work belongs to any of the free-of-charge use cases, select 'Academic' as the project type.
As a project manager, you can invite other users as members to your project.
If you are eligible to act as a [project manager](https://research.csc.fi/prerequisites-for-a-project-manager), you can create a new CSC project in MyCSC and apply access to Allas.
Select 'Academic' as the project type. As a project manager, you can invite other users as members to your project.

If you wish to be joined to an existing project, please ask the project manager to add your CSC user account to the
project member list.


### Step 1.3. Add Allas access for your project

Add _Allas_ service to your project in MyCSC. Only the project manager can add services.
Expand Down
Loading