Skip to content

serotracker/sars-cov-2-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

268 Commits
 
 
 
 

Repository files navigation

SARS-CoV-2 Serosurveillance Data by Serotracker

SeroTracker synthesizes findings from thousands of COVID-19 seroprevalence studies worldwide, providing a data platform and interactive dashboard for SARS-CoV-2 serosurveillance. This dataset in this repository represents our collection of serosurveillance studies.

Download

Our complete SARS-Cov-2 dataset can be downloaded programmatically at this URL https://raw.githubusercontent.com/serotracker/sars-cov-2-data/main/serotracker_dataset.csv.

Our dataset can also be downloaded directly from our Airtable view. This option does not allow for programmatic downloads.

Data Architecture

Our data is arranged hierarchically and consists of an article or document providing seroprevalence data (“source”), the “studies” reported within a given source, and the seroprevalence “estimates” derived from the studied population. Our CSV file is formatted to contain 1 row per estimate. Therefore, multiple rows may be associated with the same SARS-CoV-2 seroprevalence study and source.

A single source may contain multiple studies based on our definition of a serosurvey. Articles that do not report an overall seroprevalence estimate and provide information on two or more distinct cohorts (different sample frames, different samples at different time points, or non-overlapping geographic regions) without a pooled estimate were considered to be multiple studies. In these instances we extract a study record for each set of data and link extracted estimates to those individual studies.

Each study contains a summary estimate of the seroprevalence of the entire population studied (largest denominator reported). We identify and extract the ‘primary estimate’ — the summary estimate that most accurately reflects the seroprevalence at the respective geography and time-point — for every study. Other summary estimates that may exist are extracted as a sub-group estimate. For more information on how we determine the primary estimate, refer to our prioritization protocol.

A study may also contain several stratifying subgroup estimates such as age and sex. We extract subgroup estimates for age, sex, race, ethnicity, COVID-19 vaccination status, statistical analysis (population, test adjustment), and geographical area in general population sample frames. We document other subgroups present in a given study through the ‘subgroup variables available’ field. Our full list of subgroup options can be found here. For more information on our general population sample frame classification, please visit here. No subgroup estimates are extracted from special population studies.

Additional Information

Please see our Data Dictionary for explanations of our variables, data types, and descriptions as well as insight into how our data is collected by our research team.

In order to keep up to date with important changes to our dataset, please consult our Change Log regularly.

For more information about how we collect, extract and use our data, please see the Data page on our website.

If you have a SARS-CoV-2 seroprevalence study that has not yet been captured by serotracker.com, please submit the source using this form. Our research team will review each submission to evaluate whether it meets our inclusion criteria.

If you are open to being contacted about your use case for our data, please fill out this form. This helps us create datasets that are most useful to you.

Our data can be cited as this Lancet Inf Dis article.

Change Log

  • Mar 14, 2022: We added the alpha_3_code column which displays the three-letter codes defined in ISO 3166-1 to represent countries, dependent territories, and special areas of geographical interest.
  • Oct 27, 2021: We created serotracker_dataset.csv containing the following columns: Prevalence Estimate Name, Rapid Review Study Name (Text), Source Name, Publication Date, Grade of Estimate Scope, Country, State/Province, City, Sampling Start Date, Sampling End Date, Sample Frame (groups of interest), Sample Frame (age), Age Minimum, Age Maximum, Sub-grouping Variable, Subgroup category for analysis, Sub-group specific category (clean), Denominator Value, Serum positive prevalence, Serum pos prevalence, 95pct CI Lower, Serum pos prevalence, 95pct CI Upper, Test Adjustment, Population Adjustment, Adjust serum positive prevalence, Adjusted serum pos prevalence, 95pct CI Lower, Adjusted serum pos prevalence, 95pct CI Upper, Adjusted sensitivity, Adjusted specificity, Independent evaluation type, Sampling Method, Test Manufacturer, Test Type, Test Validation, Isotype(s) Reported, Antibody target, Specimen Type, Sensitivity, Specificity, Overall Risk of Bias (JBI), JBI 1, JBI 2, JBI 3, JBI 4, JBI 5, JBI 6, JBI 7, JBI 8, JBI 9, Source Type, First Author Full Name, Lead Institution, UNITY: Criteria, URL, Date Created, Last modified time, Data Quality Status.

About

Data on SARS-CoV-2 serosurveillance up to 2024.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5