Skip to content

Commit 2bc1fe1

Browse files
committed
mv workflows:db_doc.qmd to db.qmd
1 parent 6c6ebce commit 2bc1fe1

File tree

2 files changed

+111
-1
lines changed

2 files changed

+111
-1
lines changed

db.qmd

+31-1
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,34 @@ Use Unicode (`utf-8` in Python or `UTF8` in Postgresql) encoding for all databas
103103
write_excel_csv(df, 'file.csv') # implicit
104104
```
105105

106-
## Describe tables and columns
106+
## Ingest datasets with documentation
107+
108+
Use Quarto documents with chunks of R code in the [workflows](https://github.com/CalCOFI/workflows/) Github repository to ingest datasets into the database. For example, see the [ingest_noaa-calcofi-db](https://calcofi.io/workflows/ingest_noaa-calcofi-db.html) workflow.
109+
110+
```{mermaid}
111+
%%| label: fig-db_doc
112+
%%| fig-cap: "Database documentation scheme."
113+
%%| file: diagrams/db_doc.mmd
114+
```
115+
116+
Google Drive \*.csv files get ingested with a **workflow** per **dataset** (in Github repository [calcofi/workflows](https://github.com/calcofi/workflows) as a Quarto document). Data definition CSV files (`tbls_redefine.csv` , `flds_redefine.csv`) are auto-generated (if missing) and manually updated to rename and describe tables and fields. After injecting the data for each of the tables, extra metadata is added to the `COMMENT`s of each table as JSON elements (links in markdown), including at the ***table*** level:
117+
118+
- **description**: general description describing contents and how each row is unique
119+
- **source**: CSV (linked to Google Drive source as markdown)
120+
- **source_created**: datetime stamp of when source was created on GoogleDrive
121+
- **workflow**: html (rendered Quarto document on Github)
122+
- **workflow_ingested**: datetime of ingestion
123+
124+
And at the ***field*** level:
125+
126+
- **description**: general description of the field
127+
- **units**: using the International System of Units (SI) as much as possible
128+
129+
These comments are then exposed by the API [db_tables](https://api.calcofi.io/db_tables) endpoint, which can be consumed and rendered into a tabular searchable catalog with [calcofi4r::cc_db_catalog](https://calcofi.io/calcofi4r/reference/cc_db_catalog.html).
130+
131+
Additional workflows will publish the data to the various [Portals](https://calcofi.io/docs/portals.html) (ERDDAP, EDI, OBIS, NCEI) using ecological metadata language (EML) and the [EML](https://docs.ropensci.org/EML/) R package, pulling directly from the structured metadata in the database (on table and field definitions).
132+
133+
### OR Describe tables and columns directly
107134

108135
- Use the `COMMENT` clause to add descriptions to tables and columns, either through the GUI [pgadmin.calcofi.io](https://pgadmin.calcofi.io/) (by right-clicking on the table or column and selecting `Properties`) or with SQL. For example:
109136

@@ -117,6 +144,8 @@ Use Unicode (`utf-8` in Python or `UTF8` in Postgresql) encoding for all databas
117144
118145
- It is especially helpful to link to any _**workflows**_ that are responsible for the ingesting or updating of the input data.
119146

147+
### Display tables and columns with metadata
148+
120149
- These descriptions can be viewed in the CalCOFI **API** [api.calcofi.io](https://api.calcofi.io) as CSV tables (see code in [calcofi/api: `plumber.R`](https://github.com/CalCOFI/api/blob/8ad9d9ad62fd526d4b8da23357759f1ad196cb88/plumber.R#L916-L990)):
121150
- [api.calcofi.io`/db_tables`](https://api.calcofi.io/db_tables)\
122151
fields:\
@@ -145,3 +174,4 @@ Use Unicode (`utf-8` in Python or `UTF8` in Postgresql) encoding for all databas
145174

146175
- Use [`ST_Subdivide()`](https://postgis.net/docs/ST_Subdivide.html) when running spatial joins on large polygons.
147176

177+

diagrams/db_doc.mmd

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
flowchart TB
2+
%% Node definitions
3+
gd[("`<b>Source Data</b>
4+
Google Drive:
5+
calcofi/data/{dataset}/*.csv`")]
6+
iw["<b>Ingest Workflow</b>
7+
workflows: ingest_{dataset}.qmd"]
8+
dd["<b>Data Definitions</b>
9+
workflows: /ingest/{dataset}/:
10+
<ul>
11+
<li>tbls_redefine.csv</li>
12+
<li>flds_redefine.csv</li>
13+
</ul>"]
14+
db[("<b>Database</b>")]
15+
api["<b>API Endpoint</b>\n/db_tables"]
16+
catalog["<b>R Function</b>\ncalcofi4r::cc_db_catalog()"]
17+
eml["<b>Publish Workflow</b>
18+
workflows: publish_{dataset}_{portal}.qmd
19+
with {portal}s:
20+
<ul>
21+
<li>erddap</li>
22+
<li>edi</li>
23+
<li>obis</li>
24+
<li>ncei</li>
25+
</ul>"]
26+
27+
%% Edge definitions
28+
gd --> iw
29+
iw -->|"1. auto-generated"| dd
30+
dd -->|"2. manual edit"| iw
31+
iw -->|"3. data"| db
32+
iw --> comments
33+
comments -->|"4. metadata"| db
34+
db --> api
35+
api --> catalog
36+
db --> eml
37+
38+
%% Comments subgraph with internal nodes
39+
subgraph comments["<b>Database Comments</b>
40+
(stored as text in JSON format to differentiate elements)"]
41+
direction TB
42+
h["hideme"]:::hidden
43+
h~~~tbl
44+
h~~~fld
45+
tbl["per <em>Table</em>:
46+
<ul>
47+
<li>description</li>
48+
<li>source (<em>linked</em>)</li>
49+
<li>source_created (<em>datetime</em>)</li>
50+
<li>workflow (<em>linked</em>)</li>
51+
<li>workflow_ingested (<em>datetime</em>)</li>
52+
</ul>"]
53+
fld["per <em>Field</em>:
54+
<ul>
55+
<li>description</li>
56+
<li>units (SI)`</li>
57+
</ul>"]
58+
end
59+
60+
%% Clickable links
61+
click gd "https://drive.google.com/drive/folders/1xxdWa4mWkmfkJUQsHxERTp9eBBXBMbV7" "calcofi folder - Google Drive"
62+
click api "https://api.calcofi.io/db_tables" "API endpoint</b>"
63+
click catalog "https://calcofi.io/calcofi4r/reference/cc_db_catalog.html" "R package function"
64+
65+
%% Styling
66+
classDef source fill:#f9f9f9,stroke:#000,stroke-width:2px,color:#000
67+
classDef process fill:#a3e0f2,stroke:#000,stroke-width:2px,color:#000
68+
classDef eml fill:#F0FDF4,stroke:#22C55E,stroke-width:2px,color:#000,text-align:left
69+
classDef data fill:#ffbe75,stroke:#000,stroke-width:2px,color:#000
70+
classDef api fill:#9ad294,stroke:#000,stroke-width:2px,color:#000
71+
classDef meta fill:#c9a6db,stroke:#000,stroke-width:2px,color:#000,text-align:left
72+
classDef hidden display: none;
73+
74+
class gd source
75+
class dd,comments,tbl,fld meta
76+
class iw process
77+
class db data
78+
class api,catalog api
79+
class tbl,fld li
80+
class eml eml

0 commit comments

Comments
 (0)