Overarching goal: A user should be able to trigger a process in the server that pulls data from the COPA website and imports new Allegations to the database.
Things to keep in mind:
Goals:
The business need:
From Rajiv:
The primary purpose of this COPA Data Portal data capture step is to create incomplete/phantom complaint records in our database (for new complaints since our last successful FOIA response) so that we can have some matching data for the new documents that are being picked up by our crawlers/scrapers ( https://cpdp.co/crawlers and https:// cpdp.co/documents ).
The second purpose is to compare against the data that we have received via FOIA responses to whether we are missing any records (i.e., were any responsive complaint records omitted from our original dataset and if so which ones).
The third purpose is to compare different versions/snapshots of it over time and see what’s changing (is it just new records being added on to the end, or are older records being added, or removed, or altered).
From Basecamp:
The Civilian Office of Police Accountability (COPA) has just posted a new live data feed to the City's Open Data Portal that goes back 10 years. Here are a few early questions to investigate.
- Are there CRs that appear here during the comparable time period (i.e., before October 2016) that don't appear in our FOIA'd datasets (which were produced in October 2016)? If so, how many and are there any revealing common characteristics amongst them to suggest why they may have been excluded from the dataset we received in response to our FOIA requests but not excluded from this public release on the City's public data portal. More likely is the inverse, i.e., complaints that we know of through our FOIA request, but that were excluded from the City's public data portal even during the overlapping time period of November 2007 – November 2016.
- For all the CRs that exist both in the City Data Portal and in our FOIA'd datasets, how many rows have conflicting values for the dynamic data fields, such as CURRENT_STATUS (which we expect to change over time for open cases), and for data fields that we might not expect to change, such as COMPLAINT_DATE? What can we learn from any patterns amongst these kinds of unexpected discrepancies, particularly when they occur in cases that are already closed?
- Are there any reasons not to import all these data and overwrite the conflicting fields in our existing dataset with more "up-to-date" information from the City's data portal (of course, any new CRs would be missing all officer-identifying data and other fields that are not being published to the data portal, until our next FOIA request)? The City Data Portal has a relatively robust API and supports numerous open standards for public APIs. Can we do all this importing and merging programmatically and run it on the Civis Platform on a routine basis? Is there any equivalent to cron built into the Platform? Apart from sanity checks, what kinds of issues will we run into that require human intervention/judgment (no officer-identifying data also means no officer profile matching challenges)?
Overarching goal: A user should be able to trigger a process in the server that pulls data from the COPA website and imports new Allegations to the database.
Things to keep in mind:
Goals:
current categorycolumn with a reference to the data_allegationcategory table for that particular categoryThe business need:
From Rajiv:
The primary purpose of this COPA Data Portal data capture step is to create incomplete/phantom complaint records in our database (for new complaints since our last successful FOIA response) so that we can have some matching data for the new documents that are being picked up by our crawlers/scrapers ( https://cpdp.co/crawlers and https:// cpdp.co/documents ).
The second purpose is to compare against the data that we have received via FOIA responses to whether we are missing any records (i.e., were any responsive complaint records omitted from our original dataset and if so which ones).
The third purpose is to compare different versions/snapshots of it over time and see what’s changing (is it just new records being added on to the end, or are older records being added, or removed, or altered).
From Basecamp:
The Civilian Office of Police Accountability (COPA) has just posted a new live data feed to the City's Open Data Portal that goes back 10 years. Here are a few early questions to investigate.