Skip to content

When new complaints are uploaded through the UI, they should update the database #55

@adesca

Description

@adesca

Overarching goal: Data that has been uploaded should be augmented with enough database data that it can be added as a row to the database

Things to keep in mind:

  • Any errors produced in this story will be shown on the frontend so errors shouldn't be made in such a way that they couldn't be listed in the frontend

Data flow steps:

  • Augment transformed data - The transformed allegation data is augmented with existing database data so that each row of transformed data has enough information to be a valid row in the database

  • Check augmented data - Check if the data to load already exists in the database

    • 1. Check each of the newly transformed allegation rows to see if they match an existing allegation based on crid, add1, and add2:
      • If a row does match on all three then check if all of the columns match.
      • If all of the columns match, it's a duplicate and can be tossed out
      • If any of the columns don't match then save the ones that don't match under a file titled "changed allegations" under errors/
      • If a row does match on crid but does not match on add1 or add2, then save this under a file titled "crid mismatch" under errors/
      • If none of the unique identifiers match then this is a new allegation, move onto step 2
    • 2. Group the victim, complainant, civilian-witness, and cpd-witness transformed rows that reference the crid of this allegation
    • 3. Create a csv for all of the victims that reference the allegation
    • 4. Create a csv for all of the complainants that reference the allegation
    • 5. Create a csv for all of the civilian-witnesses that reference the allegations
    • 6. Create a csv for all of the cpd-witnesses that reference the allegation
    • 7. Save these created csvs to google cloud storage under augmented-data/
  • Load non conflicting augmented data

    • 1. Update the data_victim table for each victim row from the previous section
    • 2. Update the data_complainants table for each complainant row from the previous section
    • 3. Do the same with civilian witnesses
    • 4. This is different Update the policewitness table with a reference to the officer witness
      • Using first name, last name, middle initial and birth year, determine the matching officer row in data_officer
        • If you cannot find an officer that matches, record all of the officer information in a file named "missing-officer" under errors/, along with the allegation's crid
        • If you find an officer row that matches, verify that all columns of that row match the input cpd-witness information.
          • If they all match then save a new row to data_policewitness
          • If any columns do not match, record all of the cpd-witness information in a file named "officer changed" under errors/
            -Cpd-witness will not have all of the columns of the officer row, missing columns don't need to match.
        • If Multiple officers match, save all matching officers as the file "missing-officer" under errrors/ with the allegation crid

Negative cases

  • When an augmented transformed row matches an existing database row:

    • If all of the data is the same, then disregard this row
    • If any of the fields don't match the database row, save a file under errors/ that provides the original row and the new row, and specifies which fields are different. The file should be titled "conflict-[table name]" where [table name] is the name of the database table that the conflict is with
  • When a transformed row cannot be augmented:

    • Save the row in a file under errors/ that contains the row and specifies the problem

Note, the following tables are deliberately excluded from database updating because they use a data source whose transformation hasn't been implemented yet:

  • data_policeunit - Describes police units as a unit, but the 060 foia responses return information about individual officers
  • data_officerbadgenumber - This data is synthesized based on existing database data, it should have a special upload step that verifies the numbers we receive due to high potential for conflict
  • data_officer - should be updated through a roster foia response in order to ensure we get the best data
  • data_allegationcategory - This is used to classify the type of allegation, and should thus not be updated from foia responses except to insert new rows. This will receive its own story

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions