Skip to content

Files

Latest commit

531c095 · Mar 25, 2022

History

History

data

Broadband Capstone Data Guide

Data Sources

Data Name Description
BroadbandNow Open-Data Zipcode Competition & Pricing Data
US Broadband Usage Percentages Dataset Estimated broadband usage rates by zipcode
Assorted Statistics pulled from ACS/Census Assorted economic and demographic statistics by census tract pulled from 2019 ACS-5 using the Census API.
Missouri Census Data Center Geocorr A tool used to generate a dataset mapping zipcodes to US Census tracts. The columns afact & afact2 were generated using weighting of the 2010 census population. afact is the proportion of the source (zipcode) in the target (census tract). afact2 is the proportion of the target (census tract) in the source (zipcode).
Rural-Urban Commuting Area Codes Rural-Urban Commuting Area (RUCA) codes for each zip code
EBB Program Data Total enrolled households in the EBB program by zipcode
Gazetteer Files The U.S. Gazetteer Files provide a listing of all geographic areas for selected geographic area types. The files include geographic identifier codes, names, area measurements, and representative latitude and longitude coordinates.
Form 477 Broadband Deployment Data - December 2019 (version 1) Fixed Broadband Deployment Data. Info about the data is here
National Broadband Map Indicators of Need A csv export of census tract level data used to power the Indicators of Broadband Need map with the user guide

Gallardo, R. (2020). Digital Divide Index. Purdue Center for Regional Development. Retrieved from Digital Divide Index (DDI): http://pcrd.purdue.edu/ddi

Data Files

File Name Description
broadband_now/ Files downloaded from the BroadbandNow Open-Data
ebb/ Files generated from the EBB Program Data
USBroadbandUsagePercentages-master Files downloaded from the US Broadband Usage Percentages Dataset
zip_conversion_data/ Files generated using the Missouri Census Data Center Geocorr to map zipcodes to US census tracts using 2010 Census population weighting
census_data.csv Data collected using code written in census_scripts/ to pull the Census API.
merged_broadband.csv Dataset containing all of the broadband datasets in one generated by running the eda.ipynb.
merged_census_broadband.csv Interim dataset that merged broadband data to Census data using county codes. Generated from census_broadband_merge.ipynb
relabeled_census.csv census_data.csv cleaned up to calculate different statistics and re-labelled with user friendly strings. Generated from census_data_cleaning.ipynb
weighted_merged_all.csv This dataset merged the broadband data with the census data by tract, appropriately weighting tracts by the afact column. Generated from census_broadband_merge-copy.ipynb. Census -1 and -666666 values replaced with np.nan
zips_rural_urban.xlsx Dataset from Rural-Urban Commuting Area Codes
aggregated_fcc_by_tract.csv A dataset created using the FCC.ipynb notebook to add a tract geoid, aggregate provider count by tract, speed, and technology type, and correct tract ids with geography changes since 2010 tract code definitions.
fcc_census.csv A merged dataset using aggregated_fcc_by_tract.csv and relabeled_census.csv. Final/recommended dataset to use for the project.
fcc_census_2.csv A merged dataset that added a few more Census columns and the Ookla columns to fcc_census.csv

Overview of Data Columns

The table below is a brief overview of the data columns found in the current final dataset: fcc_census.csv

Column Name Definition
tract_geoid An 11-digit GEOID code that uniquely identifies this Census tract from other Census tracts in the US. This code may start with 0 so it is best to read in with: pd.read_csv("../data/fcc_census.csv", converters = {"tract_geoid" : lambda x: str(x)}) so that pandas does not strip the 0
All_Provider_Count A count of the unique provider IDs that service the area
All_Providers A set of the unique IDs that service the area. You can use the fcc_names.csv to map provider ids to the provider name or the doing business as name.
MaxAdDown The maximum download speed of all max speeds provided in any blocks in the tract
MaxAdUp The maximum upload speed of all max speeds provided in any blocks in the tract
AllMaxAdDown The set of all max download speeds offered in any blocks in the tract
AllMaxAdUp The set of all max upload speeds offered in any blocks in the tract
Wired_Provider_Count A count of unique provider IDs that service the area and are considered Wired technology type. This refers to a reported TechCode of 10-50 (i.e. DSL, Copper, Cable Modem, or Fiber)
Satellite_Provider_Count A count of unique provider IDs that service the area and are considered Wired technology type. This refers to a reported TechCode of 60 (i.e. Satellite)
Fixed_Wireless_Provider_Count A count of unique provider IDs that service the area and are considered Wired technology type. This refers to a reported TechCode of 70 (i.e. Terrestrial Fixed Wireless)
All_Provider_Count_25 A count of all unique provider IDs that service the area and have a max speed over 25 mbps
All_Provider_Count_100 A count of all unique provider IDs that service the area and have a max speed over 100 mbps
Fixed_Wireless_Provider_Count_25
Wired_Provider_Count_25
Satellite_Provider_Count_25
Fixed_Wireless_Provider_Count_100
Wired_Provider_Count_100
Satellite_Provider_Count_100
NAME The full name of the census tract
median_age_overall median age of all residents in the census tract
median_age_male median age of all male residents in the census tract
median_age_female median age of all female residents in the census tract
employment_rate % of the population that is employed in the census tract
median_income median income of all residents in the census tract
total_households total number of households in the census tract
ave_household_size average household size in the census tract
ave_family_size average family size in the census tract
total_population total population size in the census tract
median_house_value median house value in the census tract
pct_white percent non-Hispanic/Latino white in the census tract
pct_hisp_latino percent Hispanic/Latino of any race in the census tract
pct_black percent non-Hispanic/Latino black in the census tract
pct_native percent non-Hispanic/Latino American Indian and Alaska Native in the census tract
pct_asian percent non-Hispanic/Latino Asian in the census tract
pct_hi_pi percent non-Hispanic/Latino Native Hawaiian and Other Pacific Islander in the census tract
pct_other_race percent non-Hispanic/Latino and some other race alone in the census tract
pct_two+_race percent non-Hispanic/Latino two or more races alone in the census tract
pct_rent_burdened percent of the population that pays more than 30% of their income on rent in the census tract
poverty_rate percent of the population in poverty in the census tract
pct_pop_bachelors+ percent of the population with at least a Bachelor's degree in the census tract
pct_pop_hs+ percent of the population with at least a HS diploma in the census tract
pct_internet percent of the population with an internet subscription in the census tract
pct_internet_dial_up percent of the population with a dial-up internet subscription in the census tract
pct_internet_broadband_any_type percent of the population with a broadband (>25mbps) internet subscription of any type in the census tract
pct_internet_cellular percent of the population with a internet subscription and a cellular data plan in the census tract
pct_only_cellular percent of the population with only a cellular data plan as the internet subscription in the census tract
pct_internet_broadband_fiber percent of the population - With an Internet subscription!!Broadband such as cable, fiber optic or DSL
pct_internet_broadband_satellite percent of the population - With an Internet subscription!!Satellite Internet service
pct_internet_only_satellite percent of the population - With an Internet subscription!!Satellite Internet service!!Satellite Internet service with no other type of Internet subscription
pct_internet_other percent of the population - With an Internet subscription!!Other service with no other type of Internet subscription
pct_internet_no_subscrp percent of the population - Internet access without a subscription
pct_internet_none percent of the population - No Internet access
pct_computer percent of the population - Has a computer
pct_computer_with_dialup percent of the population - Has a computer:!!With dial-up Internet subscription alone
pct_computer_with_broadband percent of the population - Has a computer:!!With a broadband Internet subscription
pct_computer_no_internet percent of the population - Has a computer:!!Without an Internet subscription
pct_no_computer percent of the population - No computer
GEOID This is the same as the tract_geoid and could be dropped.
ALAND The square meters of land in the tract
AWATER The square meters of water in the tract
ALAND_SQMI The square mileage of land in the tract
AWATER_SQMI The square mileage of water in the tract
population_density the population divided by the square mileage (i.e. people/sq mile)
pct_pop_ged the percent of population that got a GED or alternative credential
pct_pop_some_college Some college attended but no degree awarded
pct_pop_associates the percent of population that got an Associate's degree
pct_pop_foreign_born the percent of population that was born outside the US
pct_pop_ssi_households the percent of households that receive SUPPLEMENTAL SECURITY INCOME (SSI), CASH PUBLIC ASSISTANCE INCOME, OR FOOD STAMPS/SNAP IN THE PAST 12 MONTHS BY HOUSEHOLD TYPE FOR CHILDREN UNDER 18 YEARS IN HOUSEHOLDS
pct_pop_lt_10k the percent of population with household income less than $10K
pct_pop_10k_thru_15k
pct_pop_15k_thru_20k
pct_pop_20k_thru_25k
pct_pop_25k_thru_30k
pct_pop_30k_thru_35k
pct_pop_35k_thru_40k
pct_pop_40k_thru_45k
pct_pop_45k_thru_50k
pct_pop_50k_thru_60k
pct_pop_60k_thru_75k
pct_pop_75k_thru_100k
pct_pop_100k_thru_125k
pct_pop_125k_thru_150k
pct_pop_150k_thru_200k
pct_pop_gt_200k the percent of population with household income greater than $200K
pct_pop_lt_5 the percent of population that are younger than 5 years old
pct_pop_5_to_9 the percent of population that are ages 5 through 9 years old
pct_pop_10_to_14
pct_pop_15_to_19
pct_pop_20_to_24
pct_pop_25_to_29
pct_pop_30_to_34
pct_pop_35_to_39
pct_pop_40_to_44
pct_pop_45_to_49
pct_pop_50_to_54
pct_pop_55_to_59
pct_pop_60_to_64
pct_pop_65_to_69
pct_pop_70_to_74
pct_pop_75_to_79
pct_pop_80_to_84
pct_pop_gt_85 the percent of population that are older than 85 years old
pct_pop_disability the percent of population with a disability
pct_pop_households_with_kids the percent of households with children under 18 living at the household
pct_health_ins_children the percent of children < 18 that have health insurance
pct_health_ins_19_64 the percent of people ages 19-64 that have health insurance
pct_health_ins_65+ the percent of people older than 65 that have health insurance
Ookla Median Download Speed (Mbps)
Ookla Median Upload Speed (Mbps)
DDI The DDI from the Purdue Digital Divide Index measures primarily physical access/adoption and socioeconomic characteristics that may limit motivation, skills, and usage. Due to data limitations it was designed as a descriptive and pragmatic tool and is not intended to be comprehensive.
INFA INFA from the Purdue Digital Divide Index score groups five variables related to broadband infrastructure and adoption: (1) percentage of total 2018 population without access to fixed broadband of at least 100 Mbps download and 20 Mbps upload as of December 2019; (2) percent of homes without a computing device (desktops, laptops, smartphones, tablets, etc.); (3) percent of homes with no internet access (have no internet subscription, including cellular data plans or dial-up); (4) median maximum advertised download speeds; and (5) median maximum advertised upload speeds.
SE SE from the Purdue Digital Divide Index score groups five variables known to impact technology adoption: (1) percent population ages 65 and over; (2) percent population 25 and over with less than high school; (3) individual poverty rate; (4) percent of noninstitutionalized civilian population with a disability: and (5) a brand new digital inequality or internet income ratio measure (IIR). In other words, these variables indirectly measure adoption since they are potential predictors of lagging technology adoption or reinforcing existing inequalities that also affect adoption.
pct_pop_income_lt_50k The percent of the population with incomes < $50K
pct_pop_income_lt_30k The percent of the population with incomes < $30K
pct_pop_income_gt_100k The percent of the population with incomes > $100K
pct_ages_gt_50 The percent of the population older than 50
pct_ages_lt_19 The percent of the population younger than 19
ruca_metro The tract has a RUCA code of 1, 2, or 3, corresponding to a metropolitan area
ruca_micro The tract has a RUCA code of 4, 5, or 6, corresponding to a micropolitan area
ruca_small_town The tract has a RUCA code of 7, 8, or 9, corresponding to a small town area
ruca_rural The tract has a RUCA code of 10, corresponding to a rural area
Comcast_present Comcast is a provider of at least some areas in the tract
ATT_present AT&T is a provider of at least some areas in the tract
HughesNet_present Hughes Net is a provider of at least some areas in the tract
GCI_Comm_Corp_present GCI Communications Corporation is a provider of at least some areas in the tract
ViaSat_present ViaSat is a provider of at least some areas in the tract
VSAT_present VSAT is a provider of at least some areas in the tract
Century_Link_present Century Link is a provider of at least some areas in the tract
Spectrum_present Spectrum is a provider of at least some areas in the tract
Crown_Castle_present Crown Castle is a provider of at least some areas in the tract
Etheric_present Etheric is a provider of at least some areas in the tract
Frontier_Communications_present Frontier Communications is a provider of at least some areas in the tract

The table below is a brief overview of the data columns found in the previously final dataset: weighted_merged_all.csv

Column Name Definition
Zip US Zipcode. Note that US Zipcodes have 5 digits. The dataset keeps removing 0s from the front of zipcodes that start with digit 0 so if you see less than 5 digits, append 0s to the front until the full zipcode is 5 digits long.
WiredCount_2020 Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code in 2020
Fwcount_2020 Number of Fixed Wireless Providers (WISPs) present in a zip code in 2020
AllProviderCount_2020 Number of Providers of any technology present in a zip code in 2020
Wired25_3_2020 Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code offering speeds of at least 25 Mbps Download / 3 Mbps Upload in 2020
Wired100_3_2020 Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code offering speeds of at least 100 Mbps Download / 3 Mbps Upload in 2020
All25_3_2020 Number of Providers (any technology) present in a zip code offering speeds of at least 25 Mbps Download / 3 Mbps Upload
All100_3 Number of Providers (any technology) present in a zip code offering speeds of at least 100 Mbps Download / 3 Mbps Upload
TestCount Number of M-Lab Speed Tests Conducted in Zip, rolling 12 months
AverageMbps Average Download Speed via M-Lab Speed Tests, rolling 12 months
FastestAverageMbps Fastest Average (90th Percentile) Download Speed via M-Lab Speed Tests, rolling 12 months
%Access to Terrestrial Broadband Percent of the Zip's Population that has Access to Terrestrial (Wired + Fixed Wireless) Broadband (25 Mbps Download / 3 Mbps Upload)
Lowest Priced Terrestrial Broadband Plan The Lowest Regular Monthly Priced Terrestrial (Wired + Fixed Wireless) Residential Standalone-Internet Broadband (25 Mbps Download / 3 Mbps Upload) Plan available in the zip
WiredCount_2015 Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code in 2015
Fwcount_2015 Number of Fixed Wireless Providers (WISPs) present in a zip code in 2015
AllProviderCount_2015 Number of Providers of any technology present in a zip code in 2015
Wired25_3_2015 Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code offering speeds of at least 25 Mbps Download / 3 Mbps Upload in 2015
Wired100_3_2015 Number of Wired (Cable, Copper, DSL, Fiber) Providers present in a zip code offering speeds of at least 100 Mbps Download / 3 Mbps Upload in 2015
All25_3_2015 Number of Providers (any technology) present in a zip code offering speeds of at least 25 Mbps Download / 3 Mbps Upload
All100_3.1 Number of Providers (any technology) present in a zip code offering speeds of at least 100 Mbps Download / 3 Mbps Upload
Total_Enrolled_Households Total Number Of Enrolled Households in the EBB Subsidy
ST Two letter state abbreviation where the zip code is located
COUNTY NAME County name where the zip code is located
BROADBAND USAGE Estimated % of the population in the zipcode that is using the internet at broadband speed.
ERROR RANGE (MAE)(+/-) mean absolute error (MAE). The non-private broadband coverage estimate will be, on average, within the mean absolute error (MAE) error range.
ERROR RANGE (95%)(+/-) 95th percentile error range. For 95% of the time, the non-private broadband coverage estimate for zip codes with a similar number of households will be within 95th percentile error range.
MSD We also provide the mean signed deviation (MSD). The mean signed deviation offers an estimate of bias introduced by the process.
ZIP_TYPE Distinguishes “Zip Code Areas” and “Post Offices or large volume customers”
RUCA1 Primary RUCA code
RUCA2 Secondary RUCA code
median_age_overall Weighted average of the median age of all residents in the zip code
median_age_male Weighted average of the median age of all male residents in the zip code
median_age_female Weighted average of the median age of all female residents in the zip code
employment_rate Weighted average of the % of the population that is employed in the zip code
median_income Weighted average of the median income of all residents in the zip code
total_households Weighted average of the total number of households in the zip code
ave_household_size Weighted average of the average household size in the zip code
ave_family_size Weighted average of the average family size in the zip code
total_population Weighted average of the total population size in the zip code
median_house_value Weighted average of the median house value in the zip code
pct_white Weighted average of the percent non-Hispanic/Latino white in the zip code
pct_hisp_latino Weighted average of the percent Hispanic/Latino of any race in the zip code
pct_black Weighted average of the percent non-Hispanic/Latino black in the zip code
pct_native Weighted average of the percent non-Hispanic/Latino American Indian and Alaska Native in the zip code
pct_asian Weighted average of the percent non-Hispanic/Latino Asian in the zip code
pct_hi_pi Weighted average of the percent non-Hispanic/Latino Native Hawaiian and Other Pacific Islander in the zip code
pct_other_race Weighted average of the percent non-Hispanic/Latino and some other race alone in the zip code
pct_two+_race Weighted average of the percent non-Hispanic/Latino two or more races alone in the zip code
pct_rent_burdened Weighted average of the percent of the population that pays more than 30% of their income on rent in the zip code
poverty_rate Weighted average of the percent of the population in poverty in the zip code
pct_pop_bachelors+ Weighted average of the percent of the population with at least a Bachelor's degree in the zip code
pct_pop_hs+ Weighted average of the percent of the population with at least a HS diploma in the zip code
pct_internet Weighted average of the percent of the population with an internet subscription in the zip code
pct_internet_dial_up Weighted average of the percent of the population with a dial-up internet subscription in the zip code
pct_internet_broadband_any_type Weighted average of the percent of the population with a broadband (>25mbps) internet subscription of any type in the zip code
pct_internet_cellular Weighted average of the percent of the population with a internet subscription and a cellular data plan in the zip code
pct_only_cellular Weighted average of the percent of the population with only a cellular data plan as the internet subscription in the zip code
pct_internet_broadband_fiber Weighted average of the percent of the population - With an Internet subscription!!Broadband such as cable, fiber optic or DSL
pct_internet_broadband_satellite Weighted average of the percent of the population - With an Internet subscription!!Satellite Internet service
pct_internet_only_satellite Weighted average of the percent of the population - With an Internet subscription!!Satellite Internet service!!Satellite Internet service with no other type of Internet subscription
pct_internet_other Weighted average of the percent of the population - With an Internet subscription!!Other service with no other type of Internet subscription
pct_internet_no_subscrp Weighted average of the percent of the population - Internet access without a subscription
pct_internet_none Weighted average of the percent of the population - No Internet access
pct_computer Weighted average of the percent of the population - Has a computer
pct_computer_with_dialup Weighted average of the percent of the population - Has a computer:!!With dial-up Internet subscription alone
pct_computer_with_broadband Weighted average of the percent of the population - Has a computer:!!With a broadband Internet subscription
pct_computer_no_internet Weighted average of the percent of the population - Has a computer:!!Without an Internet subscription
pct_no_computer Weighted average of the percent of the population - No computer

More Info on RUCA Codes

Below is more information on the RUCA codes RUCA1 & RUCA2

Primary RUCA Codes, 2010

1  Metropolitan area core: primary flow within an urbanized area (UA)
2   Metropolitan area high commuting: primary flow 30% or more to a UA
3  Metropolitan area low commuting: primary flow 10% to 30% to a UA
4 Micropolitan area core: primary flow within an Urban Cluster of 10,000 to 49,999 (large UC)
5 Micropolitan high commuting: primary flow 30% or more to a large UC
6 Micropolitan low commuting: primary flow 10% to 30% to a large UC
7 Small town core: primary flow within an Urban Cluster of 2,500 to 9,999 (small UC)
8 Small town high commuting: primary flow 30% or more to a small UC
9 Small town low commuting: primary flow 10% to 30% to a small UC
10  Rural areas: primary flow to a tract outside a UA or UC
99 Not coded: Census tract has zero population and no rural-urban identifier information

Secondary RUCA Codes, 2010

1  Metropolitan area core: primary flow within an urbanized area (UA)
1 No additional code
1.1 Secondary flow 30% to 50% to a larger UA
2  Metropolitan area high commuting: primary flow 30% or more to a UA
2 No additional code
2.1 Secondary flow 30% to 50% to a larger UA
3  Metropolitan area low commuting: primary flow 10% to 30% to a UA
3 No additional code
4 Micropolitan area core: primary flow within an Urban Cluster of 10,000 to 49,999 (large UC)
4 No additional code
4.1 Secondary flow 30% to 50% to a UA
5 Micropolitan high commuting: primary flow 30% or more to a large UC
5 No additional code
5.1 Secondary flow 30% to 50% to a UA
6 Micropolitan low commuting: primary flow 10% to 30% to a large UC
6 No additional code
7 Small town core: primary flow within an Urban Cluster of 2,500 to 9,999 (small UC)
7 No additional code
7.1 Secondary flow 30% to 50% to a UA
7.2 Secondary flow 30% to 50% to a large UC
8 Small town high commuting: primary flow 30% or more to a small UC
8 No additional code
8.1 Secondary flow 30% to 50% to a UA
8.2 Secondary flow 30% to 50% to a large UC
9 Small town low commuting: primary flow 10% to 30% to a small UC
9 No additional code
10 Rural areas: primary flow to a tract outside a UA or UC
10 No additional code
10.1 Secondary flow 30% to 50% to a UA
10.2 Secondary flow 30% to 50% to a large UC
10.3 Secondary flow 30% to 50% to a small UC
99 Not coded: Census tract has zero population and no rural-urban identifier information