ucdavisdatalab · wes-brooks · Nov 5, 2025 · Oct 22, 2025 · Oct 22, 2025 · Oct 22, 2025
diff --git a/_toc.yml b/_toc.yml
@@ -20,7 +20,9 @@ parts:
     chapters:
     - file: chapters/06_ace
     - file: chapters/07_data_storage
-    - file: chapters/08_aws
+    - file: chapters/08_code_storage
+    - file: chapters/09_aws
   - caption: Appendices
     chapters:
       - file: chapters/appendix/order_external_affiliate
+      - file: chapters/appendix/working_with_protected_data
diff --git a/chapters/02_citi_training.md b/chapters/02_citi_training.md
@@ -58,28 +58,3 @@ standards of practice.
 
 [belmont]: https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/
 [tripod]: https://www.tripod-statement.org/
-
-
-## Working with Protected Data
-
-The University of California, UC Davis, and UCDH all have policies to protect
-sensitive data such as personally identifiable information (PII) and protected
-health information (PHI). Because these policies exist at several different
-levels of the organization, they might seem confusing or intimidating, but
-they're all meant to enforce the basic principle that that must there be no
-unauthorized access to protected data.
-
-The systems and procedures described in this guide exist so that protected data
-can be stored securely on servers owned by UCDH and located on UCDH property,
-while still allowing authorized researchers to do their work even if they're
-not physically at the hospital. Completing CITI training, as explained in the
-preceding section, is a prerequisite to working with protected data.
-
-The University of California classifies data into [4
-categories][uc-protected-data], called **protection levels**, depending on the
-potential impact of disclosure or compromise. Protection level 1 (P1) consists
-of data that would have relatively low impact, such as data already publicly
-available, while protection level 4 (P4) consists of highly sensitive data,
-such as medical records and financial records.
-
-[uc-protected-data]: https://security.ucdavis.edu/data-classification-four-protection-levels
diff --git a/chapters/08_code_storage.md b/chapters/08_code_storage.md
@@ -0,0 +1,17 @@
+# Code Storage and Sharing
+If you care about creating code that will be usable by collaborators or your future self, then you should be using [git](https://git-scm.org) for version control and code management. A git repository can contain the entire history of development for a code project, as well as information about the reasons for changes and the ability to move back and forth through the project's history. The Datalab has a [workshop to teach users the basics of version control with git](https://ucdavisdatalab.github.io/workshop_reproducible_research/chapters/version-control/01_version-control-systems.html). Start there if you are new to version control.
+
+:::{warning}
+You should never put data onto a Git repository! Handling data files slows down the version tracking and even if you delete data from the current version of the repository, it remains in the project history where it is easy to forget about and later share inadvertently.
+:::
+
+## Gitlab for Collaboration
+While having a project history is great, the greatest benefits of git for version control come when you couple it with a collaborative service like Gitlab, which lets you share your code with a selected team of people. The benefits of version control are multiplied when you work in a team because git integrates changes and messages from team members in a controllable way. And Gitlab provides features like issue tracking that allow a team to discuss and collaborate on code changes as they work toward a goal, like fixing a bug or creating a new analysis. Gitlab is very similar to Github, and we recommend that you visit the [Datalab's Github workshop](https://ucdavisdatalab.github.io/workshop_reproducible_research/chapters/version-control/03_remote-repositories.html#github) to learn how to collaborate via Gitlab.
+
+:::{warning}
+Remember the warning to not put data into a Git repository? Well you should **definitely** never *ever* put data in a Gitlab repository! In addition to slowing down the version tracking, the data remains in the history even if you later delete it from the repository. This makes it easy to forget about and later share inadvertently. That might expose sensitive information!
+
+A good way to avoid accidentally adding data to the version history is to use the `.gitignore` file to ignore the `data/` folder. Then you just have to be careful that you only put data in the `data/` folder.
+:::
+
+There is a [private Gitlab hosted on UCDH premises](https://gitlab.ri.ucdavis.edu) that you should use for projects that might handle any sensitive code or data. Unlike most UCDH IT, the Gitlab access is not provided through Service Now. Instead, you will request an account and then contact the Gitlab admin on the Research Infrastructure team via email to request your account be approved. The admin is currently Chris Lambertus.
diff --git a/chapters/08_aws.md → chapters/09_aws.md b/chapters/08_aws.md → chapters/09_aws.md
@@ -27,4 +27,4 @@ The problem with EC2 is that the value it provides by efficient use of computati
 ## Databricks
 In order for a typical researcher or small team to make use of EC2, they need a layer that automates EC2 configuration and control in response to demand from the analysis side. That layer is **Databricks**. If you truly need the power of EC2, then Databricks will save you a ton of time learning how to use EC2 and integrate it with your analysis. The user then focuses on writing R or Python code in a notebook to run their analysis.
 
-UCDH 
+
diff --git a/chapters/09_citrix.md → chapters/10_citrix.md b/chapters/09_citrix.md → chapters/10_citrix.md
diff --git a/chapters/10_sharepoint.md → chapters/11_sharepoint.md b/chapters/10_sharepoint.md → chapters/11_sharepoint.md
diff --git a/chapters/appendix/working_with_protected_data.md b/chapters/appendix/working_with_protected_data.md
@@ -0,0 +1,62 @@
+(appendix:working-with-protected-data)=
+# Working with Protected Data
+
+Anyone working with patients or research subjects has an ethical and legal
+responsibility to act in their best interests. This includes protecting them
+from exposure of their private data. As a result, the University of California,
+UC Davis, and UCDH all have policies to protect sensitive data such as
+personally identifiable information (PII) and protected health information
+(PHI). Because these policies exist at several different levels of the
+organization, they might seem confusing or intimidating, but they're all meant
+to enforce the basic principle that that must there be no unauthorized access
+to protected data.
+
+:::{seealso}
+For more detail on the protections accorded to PII and PHI, check out UCDH's
+[Privacy Policies and Guidelines][ucdh-privacy].
+
+[ucdh-privacy]: https://health.ucdavis.edu/compliance/privacy/privacy-policies-and-guidelines
+:::
+
+The systems and procedures described in this guide exist so that protected data
+can be stored securely on servers owned by UCDH and located on UCDH property,
+while still allowing authorized researchers to do their work even if they're
+not physically at the hospital. Completing {ref}`chapter:citi-training` is a
+prerequisite to working with protected data.
+
+
+## Protection Levels
+
+The University of California classifies data into [4
+categories][uc-protected-data], called **protection levels**, depending on the potential impact of disclosure or compromise. Protection level 1 (P1) consists of data that would have relatively low impact, such as data already publicly available, while protection level 4 (P4) consists of highly sensitive data, such as medical records and financial records.
+
+[uc-protected-data]: https://security.ucdavis.edu/data-classification-four-protection-levels
+
+
+## Sharing Data
+
+The University of California has been working since 2018 to implement a
+protocol for sharing health data. That effort generated the [2024 UC Health
+Data Governance Task Force Report][uc-data-gov]. It is beyond the scope of this
+document to describe the entire report, so let's just summarize a few key
+points:
+
+[uc-data-gov]: https://ucop.edu/uc-health/reports-resources/uchealth-data-governance-task-force-report_2024_final_06272024.pdf
+
+- Sharing health data generally requires approval of your **health data
+  oversight committee** (HDOC). Each UC campus with a medical center (including
+  UC Davis) has established an HDOC, and there is also a central UC HDOC.
+- HDOC decisions are supposed to be guided by justice. In general, this seems
+  to align with the previous principle of informed consent, but with some
+  greater scrutiny when research on health data may have social impacts or lead
+  to for-profit products or services.
+- De-identifying data is not sufficient to allow sharing health data because it
+  can be sometimes be easily re-identified by combining with publicly- or
+  commercially-available data.
+- There is an exception for research data to not be treated as health data,
+  which means it can be shared without seeking permission from the HDOC. This
+  exception applies to some data that were collected with informed consent for
+  a research project that was overseen by an Institutional Review Board (IRB).
+- The exception *only* applies to data that were collected for no purpose other
+  than research. In general, this means that clinical data are excluded from
+  the exception.
diff --git a/pixi.toml b/pixi.toml
@@ -1,4 +1,4 @@
-[project]
+[workspace]
 authors = ["Wes B <[email protected]>"]
 channels = ["conda-forge"]
 description = "Add a short description here"
Original file line number	Diff line number	Diff line change
Expand Up		@@ -27,4 +27,4 @@ The problem with EC2 is that the value it provides by efficient use of computati
		## Databricks
		In order for a typical researcher or small team to make use of EC2, they need a layer that automates EC2 configuration and control in response to demand from the analysis side. That layer is Databricks. If you truly need the power of EC2, then Databricks will save you a ton of time learning how to use EC2 and integrate it with your analysis. The user then focuses on writing R or Python code in a notebook to run their analysis.

		UCDH