Standardizing attributions for display on scaife.perseus.org

I've created this issue to track updates to the underlying attribution data that we're now extracting / displaying on [scaife.perseus.org](https://scaife.perseus.org/)

## Overview

I've extracted the existing attributions (from `respStmt` elements) and exported them to a  Google Spreadsheet, [OGL - First1kGreek Attributions](https://docs.google.com/spreadsheets/d/1hn5BrO7vEGugon_G5_AAahNfwSaWyG_V5usxz6GRrL0/edit#gid=522035422).  I can grant access to the appropriate persons within OGL to perform bulk edits to the data.

Once the preferred edits have been made to the spreadsheet, I will use the spreadsheet to bulk update the underlying XML files with the new attribution information and open a pull request.

If this workflow works well, we can do it for other OGL repos (and ideally any other repos contributing texts to [scaife.perseus.org](https://scaife.perseus.org/))

## Desired data model

Here are a few samples of what the updated `respStmt` elements will look like:



**Thibault Clérice, Lead Developer (University of Leipzig) 2015 - 2017**

From https://github.com/OpenGreekAndLatin/First1KGreek/blob/master/data/tlg0062/tlg001/tlg0062.tlg001.1st1K-grc1.xml#L28

to:

```xml
<respStmt>
  <resp from="2015" to="2017">Lead Developer</resp>
  <persName ref="https://orcid.org/0000-0003-1852-9204">Thibault Clérice</persName>
  <orgName>University of Leipzig</orgName>
</respStmt>
```

_Notes_:

- We make use of `from` and `to` attrs to denote the timeframe of the resp.
- We set a person's ORCID in `persName.ref` 



**Simona Stoyanova, Project Manager (University of Leipzig), 2015,  Project Assistant (University of Leipzig), 2013-2014**

From https://github.com/OpenGreekAndLatin/First1KGreek/blob/master/data/stoa0146d/stoa001/stoa0146d.stoa001.opp-grc1.xml#L47 

to:

```xml
<respStmt>
  <resp when="2015">Project Manager</resp>
  <persName>Simona Stoyanova</persName>
  <orgName>University of Leipzig</orgName>
</respStmt>
<respStmt>
  <resp from="2013" to="2014">Project Assistant</resp>
  <persName>Simona Stoyanova</persName>
  <orgName>University of Leipzig</orgName>
</respStmt>
```

_Notes_:

- We move from a single respStmt containing two `resp` elements to a 1:1 relationship between `respStmt` and `resp`
- `when` and `from|to` attrs denote the resp. timeframe

**Gregory Crane, Leonard Muellner, Bruce Robertson, Published original versions of the electronic texts, Open Greek and Latin**

From https://github.com/OpenGreekAndLatin/First1KGreek/blob/3f5519b9a01ca4ff5eb56048868e83844e7755ab/data/tlg0093/tlg005/tlg0093.tlg005.1st1K-grc1.xml#L12

to:

```xml
<respStmt>
  <resp>Published original versions of the electronic texts</resp>
  <persName role="principal">Gregory Crane</persName>
  <orgName ref="https://www.opengreekandlatin.org">Open Greek and Latin</orgName>
</respStmt>
<respStmt>
  <resp>Published original versions of the electronic texts</resp>
  <persName role="principal">Leonard Muellner</persName>
  <orgName ref="https://www.opengreekandlatin.org">Open Greek and Latin</orgName>
</respStmt>
<respStmt>
  <resp>Published original versions of the electronic texts</resp>
  <persName role="principal">Bruce Robertson</persName>
  <orgName ref="https://www.opengreekandlatin.org">Open Greek and Latin</orgName>
</respStmt>
```

_Notes_:

- We move from a single `respStmt` containing multiple `persName` elements to a 1:1 relationship between `respStmt` and `persName`.
- We also include `orgName` in each `respStmt`


## Implementation

### Extraction process

Each row in the `attributions-data` worksheet corresponds to a set of URNs extracted from the underlying XML files.

There are "key" and "urn" fields which should not be modified and will be used to perform the bulk update.

### Editing attribution data in the spreadsheet

I went through and made an initial pass to clean up the data.  This involved fixing small typos in organization names, normalizing names (Mt. Allison vs Mount Allison, etc) and restructuring data to fit the desired model (discussed below).

The `unique-*` worksheets show uniquevalues for the `resp`, `orgName` and `persName`.

Ideally, we can standardize on "Proofreading" vs "proofreader" vs "Proofreading and CTS conversion" as appropriate.  If proofreading and CTS conversion are two distinct responsibilities for a given text, I would suggest:

1) Adding an additional row beneath "Proofreading and CTS conversion"

2) Edit the original `resp` to Proofreading

3) Set the `resp` in the new row to `CTS conversion`

4) Copy the other relevant fields (`resp`, `orgName` and `persName`) to the new row

5) Leave a comment on the row so I can ensure that the `urn` and `key` fields are also populated.

There are also several instances where slight variants in a person's name are used, or `resp` possibly contains data better suited for `orgName` .

We **should not** delete any rows; if there are duplicate rows in the spreadsheet, we'll use the `urn` and `key` fields to de-duplicate data.

### Bulk update process

Once edits have been finalized in the spreadsheet, I'll use the `urn` and `key` fields to map the edits back to the desired data model (see below)

I will also perform a reordering of the desired "proofreading / conversion" role(s) so that they are weighted before any other roles.

I'll open up a PR and link it back to this issue.  The PR can be merged and then the updated attributions will be made available on [scaife.perseus.org](https://scaife.perseus.org/)

## Closing thoughts
I'm not sure if there is "template" for future XML files, but I would also be happy to take the examples in [Desired data model](#desired-data-model) above and integrate them into that template.

As long as the XML files have `respStmt` with `resp` and one of `persName` or `orgName`, we can extract attributions for display on scale.perseus.org.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardizing attributions for display on scaife.perseus.org #2308

Overview

Desired data model

Implementation

Extraction process

Editing attribution data in the spreadsheet

Bulk update process

Closing thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Standardizing attributions for display on scaife.perseus.org #2308

Description

Overview

Desired data model

Implementation

Extraction process

Editing attribution data in the spreadsheet

Bulk update process

Closing thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions