Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ parts:
chapters:
- file: chapters/06_ace
- file: chapters/07_data_storage
- file: chapters/08_aws
- file: chapters/08_code_storage
- file: chapters/09_aws
- caption: Appendices
chapters:
- file: chapters/appendix/order_external_affiliate
- file: chapters/appendix/working_with_protected_data
25 changes: 0 additions & 25 deletions chapters/02_citi_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,28 +58,3 @@ standards of practice.

[belmont]: https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/
[tripod]: https://www.tripod-statement.org/


## Working with Protected Data

The University of California, UC Davis, and UCDH all have policies to protect
sensitive data such as personally identifiable information (PII) and protected
health information (PHI). Because these policies exist at several different
levels of the organization, they might seem confusing or intimidating, but
they're all meant to enforce the basic principle that that must there be no
unauthorized access to protected data.

The systems and procedures described in this guide exist so that protected data
can be stored securely on servers owned by UCDH and located on UCDH property,
while still allowing authorized researchers to do their work even if they're
not physically at the hospital. Completing CITI training, as explained in the
preceding section, is a prerequisite to working with protected data.

The University of California classifies data into [4
categories][uc-protected-data], called **protection levels**, depending on the
potential impact of disclosure or compromise. Protection level 1 (P1) consists
of data that would have relatively low impact, such as data already publicly
available, while protection level 4 (P4) consists of highly sensitive data,
such as medical records and financial records.

[uc-protected-data]: https://security.ucdavis.edu/data-classification-four-protection-levels
17 changes: 17 additions & 0 deletions chapters/08_code_storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Code Storage and Sharing
If you care about creating code that will be usable by collaborators or your future self, then you should be using [git](https://git-scm.org) for version control and code management. A git repository can contain the entire history of development for a code project, as well as information about the reasons for changes and the ability to move back and forth through the project's history. The Datalab has a [workshop to teach users the basics of version control with git](https://ucdavisdatalab.github.io/workshop_reproducible_research/chapters/version-control/01_version-control-systems.html). Start there if you are new to version control.

:::{warning}
You should never put data onto a Git repository! Handling data files slows down the version tracking and even if you delete data from the current version of the repository, it remains in the project history where it is easy to forget about and later share inadvertently.
:::

## Gitlab for Collaboration
While having a project history is great, the greatest benefits of git for version control come when you couple it with a collaborative service like Gitlab, which lets you share your code with a selected team of people. The benefits of version control are multiplied when you work in a team because git integrates changes and messages from team members in a controllable way. And Gitlab provides features like issue tracking that allow a team to discuss and collaborate on code changes as they work toward a goal, like fixing a bug or creating a new analysis. Gitlab is very similar to Github, and we recommend that you visit the [Datalab's Github workshop](https://ucdavisdatalab.github.io/workshop_reproducible_research/chapters/version-control/03_remote-repositories.html#github) to learn how to collaborate via Gitlab.

:::{warning}
Remember the warning to not put data into a Git repository? Well you should **definitely** never *ever* put data in a Gitlab repository! In addition to slowing down the version tracking, the data remains in the history even if you later delete it from the repository. This makes it easy to forget about and later share inadvertently. That might expose sensitive information!

A good way to avoid accidentally adding data to the version history is to use the `.gitignore` file to ignore the `data/` folder. Then you just have to be careful that you only put data in the `data/` folder.
:::

There is a [private Gitlab hosted on UCDH premises](https://gitlab.ri.ucdavis.edu) that you should use for projects that might handle any sensitive code or data. Unlike most UCDH IT, the Gitlab access is not provided through Service Now. Instead, you will request an account and then contact the Gitlab admin on the Research Infrastructure team via email to request your account be approved. The admin is currently Chris Lambertus.
2 changes: 1 addition & 1 deletion chapters/08_aws.md → chapters/09_aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ The problem with EC2 is that the value it provides by efficient use of computati
## Databricks
In order for a typical researcher or small team to make use of EC2, they need a layer that automates EC2 configuration and control in response to demand from the analysis side. That layer is **Databricks**. If you truly need the power of EC2, then Databricks will save you a ton of time learning how to use EC2 and integrate it with your analysis. The user then focuses on writing R or Python code in a notebook to run their analysis.

UCDH

File renamed without changes.
File renamed without changes.
62 changes: 62 additions & 0 deletions chapters/appendix/working_with_protected_data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
(appendix:working-with-protected-data)=
# Working with Protected Data

Anyone working with patients or research subjects has an ethical and legal
responsibility to act in their best interests. This includes protecting them
from exposure of their private data. As a result, the University of California,
UC Davis, and UCDH all have policies to protect sensitive data such as
personally identifiable information (PII) and protected health information
(PHI). Because these policies exist at several different levels of the
organization, they might seem confusing or intimidating, but they're all meant
to enforce the basic principle that that must there be no unauthorized access
to protected data.

:::{seealso}
For more detail on the protections accorded to PII and PHI, check out UCDH's
[Privacy Policies and Guidelines][ucdh-privacy].

[ucdh-privacy]: https://health.ucdavis.edu/compliance/privacy/privacy-policies-and-guidelines
:::

The systems and procedures described in this guide exist so that protected data
can be stored securely on servers owned by UCDH and located on UCDH property,
while still allowing authorized researchers to do their work even if they're
not physically at the hospital. Completing {ref}`chapter:citi-training` is a
prerequisite to working with protected data.


## Protection Levels

The University of California classifies data into [4
categories][uc-protected-data], called **protection levels**, depending on the potential impact of disclosure or compromise. Protection level 1 (P1) consists of data that would have relatively low impact, such as data already publicly available, while protection level 4 (P4) consists of highly sensitive data, such as medical records and financial records.

[uc-protected-data]: https://security.ucdavis.edu/data-classification-four-protection-levels


## Sharing Data

The University of California has been working since 2018 to implement a
protocol for sharing health data. That effort generated the [2024 UC Health
Data Governance Task Force Report][uc-data-gov]. It is beyond the scope of this
document to describe the entire report, so let's just summarize a few key
points:

[uc-data-gov]: https://ucop.edu/uc-health/reports-resources/uchealth-data-governance-task-force-report_2024_final_06272024.pdf

- Sharing health data generally requires approval of your **health data
oversight committee** (HDOC). Each UC campus with a medical center (including
UC Davis) has established an HDOC, and there is also a central UC HDOC.
- HDOC decisions are supposed to be guided by justice. In general, this seems
to align with the previous principle of informed consent, but with some
greater scrutiny when research on health data may have social impacts or lead
to for-profit products or services.
- De-identifying data is not sufficient to allow sharing health data because it
can be sometimes be easily re-identified by combining with publicly- or
commercially-available data.
- There is an exception for research data to not be treated as health data,
which means it can be shared without seeking permission from the HDOC. This
exception applies to some data that were collected with informed consent for
a research project that was overseen by an Institutional Review Board (IRB).
- The exception *only* applies to data that were collected for no purpose other
than research. In general, this means that clinical data are excluded from
the exception.
2 changes: 1 addition & 1 deletion pixi.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[project]
[workspace]
authors = ["Wes B <[email protected]>"]
channels = ["conda-forge"]
description = "Add a short description here"
Expand Down