Data Name | Description |
---|---|
BroadbandNow Open-Data | Zipcode Competition & Pricing Data |
US Broadband Usage Percentages Dataset | Estimated broadband usage rates by zipcode |
Assorted Statistics pulled from ACS/Census | Assorted economic and demographic statistics by census tract pulled from 2019 ACS-5 using the Census API. |
Missouri Census Data Center Geocorr | A tool used to generate a dataset mapping zipcodes to US Census tracts. The columns afact & afact2 were generated using weighting of the 2010 census population. afact is the proportion of the source (zipcode) in the target (census tract). afact2 is the proportion of the target (census tract) in the source (zipcode). |
Rural-Urban Commuting Area Codes | Rural-Urban Commuting Area (RUCA) codes for each zip code |
EBB Program Data | Total enrolled households in the EBB program by zipcode |
Gazetteer Files | The U.S. Gazetteer Files provide a listing of all geographic areas for selected geographic area types. The files include geographic identifier codes, names, area measurements, and representative latitude and longitude coordinates. |
Form 477 Broadband Deployment Data - December 2019 (version 1) | Fixed Broadband Deployment Data. Info about the data is here |
National Broadband Map Indicators of Need | A csv export of census tract level data used to power the Indicators of Broadband Need map with the user guide |
Gallardo, R. (2020). Digital Divide Index. Purdue Center for Regional Development. Retrieved from Digital Divide Index (DDI): http://pcrd.purdue.edu/ddi
File Name | Description |
---|---|
broadband_now/ | Files downloaded from the BroadbandNow Open-Data |
ebb/ | Files generated from the EBB Program Data |
USBroadbandUsagePercentages-master | Files downloaded from the US Broadband Usage Percentages Dataset |
zip_conversion_data/ | Files generated using the Missouri Census Data Center Geocorr to map zipcodes to US census tracts using 2010 Census population weighting |
census_data.csv | Data collected using code written in census_scripts/ to pull the Census API. |
merged_broadband.csv | Dataset containing all of the broadband datasets in one generated by running the eda.ipynb. |
merged_census_broadband.csv | Interim dataset that merged broadband data to Census data using county codes. Generated from census_broadband_merge.ipynb |
relabeled_census.csv | census_data.csv cleaned up to calculate different statistics and re-labelled with user friendly strings. Generated from census_data_cleaning.ipynb |
weighted_merged_all.csv | This dataset merged the broadband data with the census data by tract, appropriately weighting tracts by the afact column. Generated from census_broadband_merge-copy.ipynb. Census -1 and -666666 values replaced with np.nan |
zips_rural_urban.xlsx | Dataset from Rural-Urban Commuting Area Codes |
aggregated_fcc_by_tract.csv | A dataset created using the FCC.ipynb notebook to add a tract geoid, aggregate provider count by tract, speed, and technology type, and correct tract ids with geography changes since 2010 tract code definitions. |
fcc_census.csv | A merged dataset using aggregated_fcc_by_tract.csv and relabeled_census.csv. Final/recommended dataset to use for the project. |
fcc_census_2.csv | A merged dataset that added a few more Census columns and the Ookla columns to fcc_census.csv |
The table below is a brief overview of the data columns found in the current final dataset: fcc_census.csv
Column Name | Definition |
---|---|
tract_geoid | An 11-digit GEOID code that uniquely identifies this Census tract from other Census tracts in the US. This code may start with 0 so it is best to read in with: pd.read_csv("../data/fcc_census.csv", converters = {"tract_geoid" : lambda x: str(x)}) so that pandas does not strip the 0 |
All_Provider_Count | A count of the unique provider IDs that service the area |
All_Providers | A set of the unique IDs that service the area. You can use the fcc_names.csv to map provider ids to the provider name or the doing business as name. |
MaxAdDown | The maximum download speed of all max speeds provided in any blocks in the tract |
MaxAdUp | The maximum upload speed of all max speeds provided in any blocks in the tract |
AllMaxAdDown | The set of all max download speeds offered in any blocks in the tract |
AllMaxAdUp | The set of all max upload speeds offered in any blocks in the tract |
Wired_Provider_Count | A count of unique provider IDs that service the area and are considered Wired technology type. This refers to a reported TechCode of 10-50 (i.e. DSL, Copper, Cable Modem, or Fiber) |
Satellite_Provider_Count | A count of unique provider IDs that service the area and are considered Wired technology type. This refers to a reported TechCode of 60 (i.e. Satellite) |
Fixed_Wireless_Provider_Count | A count of unique provider IDs that service the area and are considered Wired technology type. This refers to a reported TechCode of 70 (i.e. Terrestrial Fixed Wireless) |
All_Provider_Count_25 | A count of all unique provider IDs that service the area and have a max speed over 25 mbps |
All_Provider_Count_100 | A count of all unique provider IDs that service the area and have a max speed over 100 mbps |
Fixed_Wireless_Provider_Count_25 | |
Wired_Provider_Count_25 | |
Satellite_Provider_Count_25 | |
Fixed_Wireless_Provider_Count_100 | |
Wired_Provider_Count_100 | |
Satellite_Provider_Count_100 | |
NAME | The full name of the census tract |
median_age_overall | median age of all residents in the census tract |
median_age_male | median age of all male residents in the census tract |
median_age_female | median age of all female residents in the census tract |
employment_rate | % of the population that is employed in the census tract |
median_income | median income of all residents in the census tract |
total_households | total number of households in the census tract |
ave_household_size | average household size in the census tract |
ave_family_size | average family size in the census tract |
total_population | total population size in the census tract |
median_house_value | median house value in the census tract |
pct_white | percent non-Hispanic/Latino white in the census tract |
pct_hisp_latino | percent Hispanic/Latino of any race in the census tract |
pct_black | percent non-Hispanic/Latino black in the census tract |
pct_native | percent non-Hispanic/Latino American Indian and Alaska Native in the census tract |
pct_asian | percent non-Hispanic/Latino Asian in the census tract |
pct_hi_pi | percent non-Hispanic/Latino Native Hawaiian and Other Pacific Islander in the census tract |
pct_other_race | percent non-Hispanic/Latino and some other race alone in the census tract |
pct_two+_race | percent non-Hispanic/Latino two or more races alone in the census tract |
pct_rent_burdened | percent of the population that pays more than 30% of their income on rent in the census tract |
poverty_rate | percent of the population in poverty in the census tract |
pct_pop_bachelors+ | percent of the population with at least a Bachelor's degree in the census tract |
pct_pop_hs+ | percent of the population with at least a HS diploma in the census tract |
pct_internet | percent of the population with an internet subscription in the census tract |
pct_internet_dial_up | percent of the population with a dial-up internet subscription in the census tract |
pct_internet_broadband_any_type | percent of the population with a broadband (>25mbps) internet subscription of any type in the census tract |
pct_internet_cellular | percent of the population with a internet subscription and a cellular data plan in the census tract |
pct_only_cellular | percent of the population with only a cellular data plan as the internet subscription in the census tract |
pct_internet_broadband_fiber | percent of the population - With an Internet subscription!!Broadband such as cable, fiber optic or DSL |
pct_internet_broadband_satellite | percent of the population - With an Internet subscription!!Satellite Internet service |
pct_internet_only_satellite | percent of the population - With an Internet subscription!!Satellite Internet service!!Satellite Internet service with no other type of Internet subscription |
pct_internet_other | percent of the population - With an Internet subscription!!Other service with no other type of Internet subscription |
pct_internet_no_subscrp | percent of the population - Internet access without a subscription |
pct_internet_none | percent of the population - No Internet access |
pct_computer | percent of the population - Has a computer |
pct_computer_with_dialup | percent of the population - Has a computer:!!With dial-up Internet subscription alone |
pct_computer_with_broadband | percent of the population - Has a computer:!!With a broadband Internet subscription |
pct_computer_no_internet | percent of the population - Has a computer:!!Without an Internet subscription |
pct_no_computer | percent of the population - No computer |
GEOID | This is the same as the tract_geoid and could be dropped. |
ALAND | The square meters of land in the tract |
AWATER | The square meters of water in the tract |
ALAND_SQMI | The square mileage of land in the tract |
AWATER_SQMI | The square mileage of water in the tract |
population_density | the population divided by the square mileage (i.e. people/sq mile) |
pct_pop_ged | the percent of population that got a GED or alternative credential |
pct_pop_some_college | Some college attended but no degree awarded |
pct_pop_associates | the percent of population that got an Associate's degree |
pct_pop_foreign_born | the percent of population that was born outside the US |
pct_pop_ssi_households | the percent of households that receive SUPPLEMENTAL SECURITY INCOME (SSI), CASH PUBLIC ASSISTANCE INCOME, OR FOOD STAMPS/SNAP IN THE PAST 12 MONTHS BY HOUSEHOLD TYPE FOR CHILDREN UNDER 18 YEARS IN HOUSEHOLDS |
pct_pop_lt_10k | the percent of population with household income less than $10K |
pct_pop_10k_thru_15k | |
pct_pop_15k_thru_20k | |
pct_pop_20k_thru_25k | |
pct_pop_25k_thru_30k | |
pct_pop_30k_thru_35k | |
pct_pop_35k_thru_40k | |
pct_pop_40k_thru_45k | |
pct_pop_45k_thru_50k | |
pct_pop_50k_thru_60k | |
pct_pop_60k_thru_75k | |
pct_pop_75k_thru_100k | |
pct_pop_100k_thru_125k | |
pct_pop_125k_thru_150k | |
pct_pop_150k_thru_200k | |
pct_pop_gt_200k | the percent of population with household income greater than $200K |
pct_pop_lt_5 | the percent of population that are younger than 5 years old |
pct_pop_5_to_9 | the percent of population that are ages 5 through 9 years old |
pct_pop_10_to_14 | |
pct_pop_15_to_19 | |
pct_pop_20_to_24 | |
pct_pop_25_to_29 | |
pct_pop_30_to_34 | |
pct_pop_35_to_39 | |
pct_pop_40_to_44 | |
pct_pop_45_to_49 | |
pct_pop_50_to_54 | |
pct_pop_55_to_59 | |
pct_pop_60_to_64 | |
pct_pop_65_to_69 | |
pct_pop_70_to_74 | |
pct_pop_75_to_79 | |
pct_pop_80_to_84 | |
pct_pop_gt_85 | the percent of population that are older than 85 years old |
pct_pop_disability | the percent of population with a disability |
pct_pop_households_with_kids | the percent of households with children under 18 living at the household |
pct_health_ins_children | the percent of children < 18 that have health insurance |
pct_health_ins_19_64 | the percent of people ages 19-64 that have health insurance |
pct_health_ins_65+ | the percent of people older than 65 that have health insurance |
Ookla Median Download Speed (Mbps) | |
Ookla Median Upload Speed (Mbps) | |
DDI | The DDI from the Purdue Digital Divide Index measures primarily physical access/adoption and socioeconomic characteristics that may limit motivation, skills, and usage. Due to data limitations it was designed as a descriptive and pragmatic tool and is not intended to be comprehensive. |
INFA | INFA from the Purdue Digital Divide Index score groups five variables related to broadband infrastructure and adoption: (1) percentage of total 2018 population without access to fixed broadband of at least 100 Mbps download and 20 Mbps upload as of December 2019; (2) percent of homes without a computing device (desktops, laptops, smartphones, tablets, etc.); (3) percent of homes with no internet access (have no internet subscription, including cellular data plans or dial-up); (4) median maximum advertised download speeds; and (5) median maximum advertised upload speeds. |
SE | SE from the Purdue Digital Divide Index score groups five variables known to impact technology adoption: (1) percent population ages 65 and over; (2) percent population 25 and over with less than high school; (3) individual poverty rate; (4) percent of noninstitutionalized civilian population with a disability: and (5) a brand new digital inequality or internet income ratio measure (IIR). In other words, these variables indirectly measure adoption since they are potential predictors of lagging technology adoption or reinforcing existing inequalities that also affect adoption. |
pct_pop_income_lt_50k | The percent of the population with incomes < $50K |
pct_pop_income_lt_30k | The percent of the population with incomes < $30K |
pct_pop_income_gt_100k | The percent of the population with incomes > $100K |
pct_ages_gt_50 | The percent of the population older than 50 |
pct_ages_lt_19 | The percent of the population younger than 19 |
ruca_metro | The tract has a RUCA code of 1, 2, or 3, corresponding to a metropolitan area |
ruca_micro | The tract has a RUCA code of 4, 5, or 6, corresponding to a micropolitan area |
ruca_small_town | The tract has a RUCA code of 7, 8, or 9, corresponding to a small town area |
ruca_rural | The tract has a RUCA code of 10, corresponding to a rural area |
Comcast_present | Comcast is a provider of at least some areas in the tract |
ATT_present | AT&T is a provider of at least some areas in the tract |
HughesNet_present | Hughes Net is a provider of at least some areas in the tract |
GCI_Comm_Corp_present | GCI Communications Corporation is a provider of at least some areas in the tract |
ViaSat_present | ViaSat is a provider of at least some areas in the tract |
VSAT_present | VSAT is a provider of at least some areas in the tract |
Century_Link_present | Century Link is a provider of at least some areas in the tract |
Spectrum_present | Spectrum is a provider of at least some areas in the tract |
Crown_Castle_present | Crown Castle is a provider of at least some areas in the tract |
Etheric_present | Etheric is a provider of at least some areas in the tract |
Frontier_Communications_present | Frontier Communications is a provider of at least some areas in the tract |
The table below is a brief overview of the data columns found in the previously final dataset: weighted_merged_all.csv
Column Name | Definition |
---|---|
Zip | US Zipcode. Note that US Zipcodes have 5 digits. The dataset keeps removing 0s from the front of zipcodes that start with digit 0 so if you see less than 5 digits, append 0s to the front until the full zipcode is 5 digits long. |
WiredCount_2020 | Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code in 2020 |
Fwcount_2020 | Number of Fixed Wireless Providers (WISPs) present in a zip code in 2020 |
AllProviderCount_2020 | Number of Providers of any technology present in a zip code in 2020 |
Wired25_3_2020 | Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code offering speeds of at least 25 Mbps Download / 3 Mbps Upload in 2020 |
Wired100_3_2020 | Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code offering speeds of at least 100 Mbps Download / 3 Mbps Upload in 2020 |
All25_3_2020 | Number of Providers (any technology) present in a zip code offering speeds of at least 25 Mbps Download / 3 Mbps Upload |
All100_3 | Number of Providers (any technology) present in a zip code offering speeds of at least 100 Mbps Download / 3 Mbps Upload |
TestCount | Number of M-Lab Speed Tests Conducted in Zip, rolling 12 months |
AverageMbps | Average Download Speed via M-Lab Speed Tests, rolling 12 months |
FastestAverageMbps | Fastest Average (90th Percentile) Download Speed via M-Lab Speed Tests, rolling 12 months |
%Access to Terrestrial Broadband | Percent of the Zip's Population that has Access to Terrestrial (Wired + Fixed Wireless) Broadband (25 Mbps Download / 3 Mbps Upload) |
Lowest Priced Terrestrial Broadband Plan | The Lowest Regular Monthly Priced Terrestrial (Wired + Fixed Wireless) Residential Standalone-Internet Broadband (25 Mbps Download / 3 Mbps Upload) Plan available in the zip |
WiredCount_2015 | Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code in 2015 |
Fwcount_2015 | Number of Fixed Wireless Providers (WISPs) present in a zip code in 2015 |
AllProviderCount_2015 | Number of Providers of any technology present in a zip code in 2015 |
Wired25_3_2015 | Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code offering speeds of at least 25 Mbps Download / 3 Mbps Upload in 2015 |
Wired100_3_2015 | Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code offering speeds of at least 100 Mbps Download / 3 Mbps Upload in 2015 |
All25_3_2015 | Number of Providers (any technology) present in a zip code offering speeds of at least 25 Mbps Download / 3 Mbps Upload |
All100_3.1 | Number of Providers (any technology) present in a zip code offering speeds of at least 100 Mbps Download / 3 Mbps Upload |
Total_Enrolled_Households | Total Number Of Enrolled Households in the EBB Subsidy |
ST | Two letter state abbreviation where the zip code is located |
COUNTY NAME | County name where the zip code is located |
BROADBAND USAGE | Estimated % of the population in the zipcode that is using the internet at broadband speed. |
ERROR RANGE (MAE)(+/-) | mean absolute error (MAE). The non-private broadband coverage estimate will be, on average, within the mean absolute error (MAE) error range. |
ERROR RANGE (95%)(+/-) | 95th percentile error range. For 95% of the time, the non-private broadband coverage estimate for zip codes with a similar number of households will be within 95th percentile error range. |
MSD | We also provide the mean signed deviation (MSD). The mean signed deviation offers an estimate of bias introduced by the process. |
ZIP_TYPE | Distinguishes “Zip Code Areas” and “Post Offices or large volume customers” |
RUCA1 | Primary RUCA code |
RUCA2 | Secondary RUCA code |
median_age_overall | Weighted average of the median age of all residents in the zip code |
median_age_male | Weighted average of the median age of all male residents in the zip code |
median_age_female | Weighted average of the median age of all female residents in the zip code |
employment_rate | Weighted average of the % of the population that is employed in the zip code |
median_income | Weighted average of the median income of all residents in the zip code |
total_households | Weighted average of the total number of households in the zip code |
ave_household_size | Weighted average of the average household size in the zip code |
ave_family_size | Weighted average of the average family size in the zip code |
total_population | Weighted average of the total population size in the zip code |
median_house_value | Weighted average of the median house value in the zip code |
pct_white | Weighted average of the percent non-Hispanic/Latino white in the zip code |
pct_hisp_latino | Weighted average of the percent Hispanic/Latino of any race in the zip code |
pct_black | Weighted average of the percent non-Hispanic/Latino black in the zip code |
pct_native | Weighted average of the percent non-Hispanic/Latino American Indian and Alaska Native in the zip code |
pct_asian | Weighted average of the percent non-Hispanic/Latino Asian in the zip code |
pct_hi_pi | Weighted average of the percent non-Hispanic/Latino Native Hawaiian and Other Pacific Islander in the zip code |
pct_other_race | Weighted average of the percent non-Hispanic/Latino and some other race alone in the zip code |
pct_two+_race | Weighted average of the percent non-Hispanic/Latino two or more races alone in the zip code |
pct_rent_burdened | Weighted average of the percent of the population that pays more than 30% of their income on rent in the zip code |
poverty_rate | Weighted average of the percent of the population in poverty in the zip code |
pct_pop_bachelors+ | Weighted average of the percent of the population with at least a Bachelor's degree in the zip code |
pct_pop_hs+ | Weighted average of the percent of the population with at least a HS diploma in the zip code |
pct_internet | Weighted average of the percent of the population with an internet subscription in the zip code |
pct_internet_dial_up | Weighted average of the percent of the population with a dial-up internet subscription in the zip code |
pct_internet_broadband_any_type | Weighted average of the percent of the population with a broadband (>25mbps) internet subscription of any type in the zip code |
pct_internet_cellular | Weighted average of the percent of the population with a internet subscription and a cellular data plan in the zip code |
pct_only_cellular | Weighted average of the percent of the population with only a cellular data plan as the internet subscription in the zip code |
pct_internet_broadband_fiber | Weighted average of the percent of the population - With an Internet subscription!!Broadband such as cable, fiber optic or DSL |
pct_internet_broadband_satellite | Weighted average of the percent of the population - With an Internet subscription!!Satellite Internet service |
pct_internet_only_satellite | Weighted average of the percent of the population - With an Internet subscription!!Satellite Internet service!!Satellite Internet service with no other type of Internet subscription |
pct_internet_other | Weighted average of the percent of the population - With an Internet subscription!!Other service with no other type of Internet subscription |
pct_internet_no_subscrp | Weighted average of the percent of the population - Internet access without a subscription |
pct_internet_none | Weighted average of the percent of the population - No Internet access |
pct_computer | Weighted average of the percent of the population - Has a computer |
pct_computer_with_dialup | Weighted average of the percent of the population - Has a computer:!!With dial-up Internet subscription alone |
pct_computer_with_broadband | Weighted average of the percent of the population - Has a computer:!!With a broadband Internet subscription |
pct_computer_no_internet | Weighted average of the percent of the population - Has a computer:!!Without an Internet subscription |
pct_no_computer | Weighted average of the percent of the population - No computer |
Below is more information on the RUCA codes RUCA1 & RUCA2
Primary RUCA Codes, 2010
1 Metropolitan area core: primary flow within an urbanized area (UA)
2 Metropolitan area high commuting: primary flow 30% or more to a UA
3 Metropolitan area low commuting: primary flow 10% to 30% to a UA
4 Micropolitan area core: primary flow within an Urban Cluster of 10,000 to 49,999 (large UC)
5 Micropolitan high commuting: primary flow 30% or more to a large UC
6 Micropolitan low commuting: primary flow 10% to 30% to a large UC
7 Small town core: primary flow within an Urban Cluster of 2,500 to 9,999 (small UC)
8 Small town high commuting: primary flow 30% or more to a small UC
9 Small town low commuting: primary flow 10% to 30% to a small UC
10 Rural areas: primary flow to a tract outside a UA or UC
99 Not coded: Census tract has zero population and no rural-urban identifier information
Secondary RUCA Codes, 2010
1 Metropolitan area core: primary flow within an urbanized area (UA)
1 No additional code
1.1 Secondary flow 30% to 50% to a larger UA
2 Metropolitan area high commuting: primary flow 30% or more to a UA
2 No additional code
2.1 Secondary flow 30% to 50% to a larger UA
3 Metropolitan area low commuting: primary flow 10% to 30% to a UA
3 No additional code
4 Micropolitan area core: primary flow within an Urban Cluster of 10,000 to 49,999 (large UC)
4 No additional code
4.1 Secondary flow 30% to 50% to a UA
5 Micropolitan high commuting: primary flow 30% or more to a large UC
5 No additional code
5.1 Secondary flow 30% to 50% to a UA
6 Micropolitan low commuting: primary flow 10% to 30% to a large UC
6 No additional code
7 Small town core: primary flow within an Urban Cluster of 2,500 to 9,999 (small UC)
7 No additional code
7.1 Secondary flow 30% to 50% to a UA
7.2 Secondary flow 30% to 50% to a large UC
8 Small town high commuting: primary flow 30% or more to a small UC
8 No additional code
8.1 Secondary flow 30% to 50% to a UA
8.2 Secondary flow 30% to 50% to a large UC
9 Small town low commuting: primary flow 10% to 30% to a small UC
9 No additional code
10 Rural areas: primary flow to a tract outside a UA or UC
10 No additional code
10.1 Secondary flow 30% to 50% to a UA
10.2 Secondary flow 30% to 50% to a large UC
10.3 Secondary flow 30% to 50% to a small UC
99 Not coded: Census tract has zero population and no rural-urban identifier information