ucdavisdatalab · elisehellwig · Oct 17, 2025 · Oct 20, 2025 · Oct 20, 2025 · Oct 20, 2025
diff --git a/chapters/02_primary_practices.md b/chapters/02_primary_practices.md
@@ -229,6 +229,94 @@ toy_data/       Very (very!) small pieces of data for dev testing
 If a directory contains many files or subdirectories, consider whether it's
 clearer to write a separate manifest specifically for that directory.
 
+### Document the Data
+
+In a perfect world, every data set would come with detailed documentation or
+**metadata** about how the data were collected, what assumptions were made,
+what biases might be present, any ethical concerns, the overall structure, what
+each observation means, what each feature means, and more. Good data
+documentation guides researchers towards appropriate, responsible use of the
+data.
+
+Collecting data as part of a project gives you and your collaborators control
+over how the data are documented, so you can ensure there are no gaps. If your
+project uses data collected earlier or by someone else, it's a good practice to
+fill gaps in the existing documentation with your own. Thorough documentation
+isn't just beneficial to other researchers, it's also beneficial to future
+you---small details you notice and document about features could be important
+later in the project.
+
+:::{seealso}
+See DataLab's [README, Write Me! workshop reader][datalab-readme] for more
+about how to document data.
+:::
+
+[datalab-readme]: https://ucdavisdatalab.github.io/workshop_how-to-data-documentation/
+
+(create-data-dictionary)=
+#### Create a Data Dictionary
+
+A **data dictionary**, part of your metadata, is a document that explains what
+every field or element in your dataset means as well as any restrictions on
+their values. This includes things like the data type (ex. number, date, text,
+boolean), and whether that field can be missing. The more information you
+include, the more helpful it will be down the line (see [Captain
+Obvious][captain_o]). Data dictionaries are the most efficient way to
+communicate the structure and content of your data to other collaborators,
+including future you! A very basic one could look like this:
+
+|Field Name |Field Description                         |Data Type   |Notes     |
+|-----------|------------------------------------------|------------|----------|
+|person_id  |autogenerated by database                 |integer     |          |
+|name       |legal full name (family name, given name) |string      |          |
+|occupation |A person's job or vocation                |string      |Must come from the Bureau of Labor Statistics Occupation List |
+|...        |...                                       |...         |...       |
+
+
+
+If you aren't sure where to start with creating a data dictionary, DataLab has a
+[template][datalab_dd_template] you can use as a jumping off point. If you
+prefer step by step instructions, Kristin Briney's [Create a Data Dictionary
+exercise][create_dd] minght be for you. [Open Science Framework (OSF)][osf_dd] has
+resources on what details to add to your data dictionary, and the
+[USGS][usgs_dd] provides many examples of data dictionaries and how they are
+used in different contexts. If you are working with multiple data sets, make
+sure to clarify which data dictionary to use with each data set.
+
+If your dataset looks less like a series of rows and columns, and more like a
+long list of files, consider creating a **data inventory** instead. A data
+inventory should include the author or source, title, publication year (if
+published), and file name for each file, but can include more file metadata as
+necessary. A data inventory for a public domain fiction data set would look
+something like this.
+
+|Author              |Title               |Year |Filename                                  |
+|--------------------|--------------------|-----|------------------------------------------|
+|Bronte,Charlotte    |JaneEyre            |1847 |EN_1847_BronteCharlotte_JaneEyre.txt      |
+|Austen,Jane         |SenseandSensibility |1811 |EN_1811_AustenJane_SenseandSensibility.txt|
+|Wollstonecraft,Mary |Maria               |1798 |EN_1798_WollstonecraftMary_Maria.txt      |
+|...                 |...                 |...  |...                                       |
+
+
+If you also need to keep track of things like the provenance or license
+associated with each file or data set, DataLab's 
+[data inventory template][datalab_di_template] provides a pretty comprehensive
+starting point. 
+
+[osf_dd]: https://help.osf.io/article/217-how-to-make-a-data-dictionary
+[usgs_dd]: https://www.usgs.gov/data-management/data-dictionaries
+[captain_o]: https://dataedo.com/blog/captain-obivous-guide-to-column-descriptions-data-dictionary-best-practices
+[datalab_dd_template]: https://docs.google.com/spreadsheets/d/12N0hKyeT0ndZnt7rVZsz7LTW--BHhbb6TOegXEKQoxE/edit?usp=sharing
+[datalab_di_template]: https://docs.google.com/spreadsheets/d/1nUb-eu82Q7VplDpk0np5rYuaN52mYHLdql18pRD0i4Y/edit?usp=sharing
+[create_dd]: https://caltechlibrary.github.io/RDMworkbook/documentation.html#data-dictionary
+
+:::{seealso}
+See [OSF][osf_dd] and the [Research Data Management Workbook][create_dd] for how
+to create a data dictionary, UC Davis DataLab for [data
+dictionary][datalab_dd_template] and [data inventory][datalab_di_template]
+templates, and [USGS][usgs_dd] for examples.
+:::
+
 
 (workflows)=
 #### Workflows

diff --git a/chapters/03_secondary_practices.md b/chapters/03_secondary_practices.md
@@ -12,31 +12,6 @@ relevant, and we recommend you do too.
 Documentation
 -------------
 
-### Document the Data
-
-In a perfect world, every data set would come with detailed documentation or
-**metadata** about how the data were collected, what assumptions were made,
-what biases might be present, any ethical concerns, the overall structure, what
-each observation means, what each feature means, and more. Good data
-documentation guides researchers towards appropriate, responsible use of the
-data.
-
-Collecting data as part of a project gives you and your collaborators control
-over how the data are documented, so you can ensure there are no gaps. If your
-project uses data collected earlier or by someone else, it's a good practice to
-fill gaps in the existing documentation with your own. Thorough documentation
-isn't just beneficial to other researchers, it's also beneficial to future
-you---small details you notice and document about features could be important
-later in the project.
-
-:::{seealso}
-See DataLab's [README, Write Me! workshop reader][datalab-readme] for more
-about how to document data.
-:::
-
-[datalab-readme]: https://ucdavisdatalab.github.io/workshop_how-to-data-documentation/
-
-
 (document-every-experiment)=
 ### Document Every Experiment