-
Notifications
You must be signed in to change notification settings - Fork 6
Home
See the Technology Overview for details on the tools
CRLite promises substantial compression of the dataset; In our staging environment, the binary form of all unexpired certificate serial numbers comprises about 16 GB of memory in Redis; the hexadecimal form of all enrolled and unexpired certificate serial numbers comprises about 6.7 GB on disk, while the resulting binary Bloom filter compresses to approximately 1.3 MB.
TODO: get a binary form of the 6.7 GB number
Bloom filters are probabilistic data structures with an error rate due to data collisions. However, if you know the whole range of data that might be tested against the filter, you can compute all the false positives and build another layer to resolve those. Then you keep going until there are no more false positives. In practice, this happens in 25 to 30 layers, which results in substantial compression.
The key innovation for CRLite is that Certificate Transparency (CT) data can be used as a stand-in for "all the certificates in the Web PKI". It's reasonably easy to tell if a certificate is in Certificate Transparency: Was it delivered with a Signed Certificate Timestamp (SCT) from a CT log? Similarly, it's reasonably easy to tell that a certificate was known to a CT log at the time that the CRLite filter was constructed: Was the SCT at least one Maximum Merge Delay older than the CRLite filter?
The remaining issues are whether the Issuer is included/enrolled in the CRLite filter set, which is provided as a flag along with the Firefox Intermediate Preloading data.
We’re using MurmurHash3 because it’s fast and there’s no currently-known need for a cryptographically secure hash function. Even though Murmur is not designed to be cryptographically secure, the input data for Murmur includes a SHA256 hash of the issuer's Subject Public Key Information (SPKI) and the certificate's serial number.
The obvious threat model against the input data involves manipulating hashes through manipulation of certificate serial numbers -- which have certain requirements on them by the CABForum Baseline Requirements, making them difficult as a vector of attack. Nevertheless, this is an area of active research.
There are few hashes needed for Firefox clients to check CRLite (one per level), so if in the future we need to move to a more secure hash function, the majority of the additional complexity will happen at the infrastructure-side, which can more easily scale up.
They tend to be between 20kB and 50kB, in a form we call "stashes". You can use the crlite_status
tool to investigate the sizes of recent runs. Similarly, you can use moz_crlite_query
to read and evaluate certificates against the filter+stash sets.
You can see an output of the crlite-status tool, which shows filter statistics by date, here: https://gist.github.com/jcjones/1fd9f63f93c7b85f87f4ac9b0f134905
All CAs that have fresh Certificate Revocation Lists (CRLs) encoded into their issued certificates get included into CRLite. Freshness meaning that the CRLs' signatures are valid and that they aren't passed their NextUpdate
time.
We initially thought we would hand-pick some issuing CAs, but automation was simpler.
Analysis why issuers become unenrolled in CRLite is still active, but the usual culprit in the logs is that the next CRL simply can't be downloaded by the CRLite aggregate-crls
tooling, which has limited retry and resume functionality. There is audit data available using the crlite-status tool with the --crl
options to analyze when issuers are being enrolled or unenrolled in CRLite.
Firefox will use OCSP (stapled or actively queried) if the certificate's Signed Certificate Timestamps are too new for the current filter.
CRLite won't be used. If the issuer is truly unknown, Firefox will give an unknown issuer warning like always, nothing there will change. If the issuer is not in the Mozilla Root Program, then it won't be eligible for CRLite.
CRLite will only run on issuers that are annotated as enrolled in CRLite in Firefox's Intermediate Preloading data. The list can be examined directly using your favorite JSON tooling at this URL: https://firefox.settings.services.mozilla.com/v1/buckets/security-state/collections/intermediates/records
For details on downloading the attached data file, see the Kinto Attachment plugin for Kinto, used by Firefox Remote Settings.
In the short term, we're interested in gathering telemetry on these cases, though no such telemetry is currently defined. That said, at Internet-scale, this is likely a common occurrence: Certificate Authorities generally have lag in updating revocation information, and there's no requirement that CRLs and OCSP update together.
If CRLite proves robust enough, in this scenario we would expect that the CRLite revocation would take precedence, and OCSP would never be checked.
The CRLite filters are published manually at Firefox Remote Settings. You can examine the data using JSON tooling at this URL: https://firefox.settings.services.mozilla.com/v1/buckets/security-state/collections/cert-revocations/records
For details on downloading the attached data file, see the Kinto Attachment plugin for Kinto, used by Firefox Remote Settings. But using jq
and httpie
, one can chain commands together to obtain the current filter by:
base_url=$(http https://firefox.settings.services.mozilla.com/v1/ | jq -r '.capabilities.attachments.base_url')
path=$(http https://firefox.settings.services.mozilla.com/v1/buckets/security-state/collections/cert-revocations/records | jq -r '.data[0].attachment.location')
http --download --output filter.mlbf ${base_url}${path}
The production data is hosted in Google Cloud Storage in a bucket named crlite-filters-prod
. The web interface for the files is accessible publicly here, though browsing it requires a Google login: https://console.cloud.google.com/storage/browser/crlite-filters-prod
The staging environment, which contains only a fraction of the WebPKI, is here: https://console.cloud.google.com/storage/browser/crlite-filters-stage
The Google gsutil
tool is handy for downloading entire datasets (~7 GB each). These commands would download all the files:
mkdir crlite-dataset/
gsutil -m cp -r gs://crlite-filters-prod/20200101-0 crlite-dataset/
The known
folder contains JSON files named by the enrolled issuing CA of all their unexpired DER-encoded serial numbers. The revoked
folder has files of the same issuing CA format, but contains DER-encoded serial numbers of the revoked certificates. The serials in revoked
are not guarnateed to be a subset of known
, as many are likely expired, so set math is required to get known revoked
from the directories.
The mlbf
folder contains the filter and its metadata as-generated.
The log
folder contain all the logs for the runs. As of this writing, many errors and warnings are still emitted that require bugfixing in one fashion or other. There are also many pointers to potential CRL problems with CAs, though few are compliance issues, and at least some are known to be innocent problems.
The crlite-status
tool is probably what you're looking for. You can get it from pypi:
pip3 install crlite-status
crlite_status 8
You'll need the crlite repository downloaded locally, and to install the requirements.txt
Python packages.
With a full dataset at hand from the above gsutil
command:
python3 ~/git/crlite/create_filter_cascade/certs_to_crlite.py -knownPath ./20200101-0/known/ -revokedPath ./20200101-0/revoked/ my_filter_identifier
With sufficient memory, you'll get the output filter; it should be deterministic.
Firefox uses https://github.com/mozilla/rust-cascade . There's a simple Python tool for this called moz_crlite_query which can be installed from PyPi as pip3 install moz_crlite_query
. Keep in mind it requires Python 3.7+:
pip3 install moz_crlite_query
cat >/tmp/top4.txt <<EOF
apple.com
youtube.com
www.google.com:443
# This is definitely half of my top 8 spaces
www.blogger.com
EOF
moz_crlite_query --hosts mozilla.com firefox.com --hosts getfirefox.net --hosts-file /tmp/top4.txt
See the main README.md.
It's extremely inefficient, having to do so many OCSP queries. While the original paper's implementation did it, and so did casebenton/certificate-revocation-analysis
(our initial proof-of-principal), downloading CRLs scales much better. If CRLite gains traction, OCSP bandwidth savings and speedups may prove to be reasons for CAs to issue CRLs.
They're binary-encoded flat lists of Issuer Subject Public Key Information hashes, followed by a list of serial numbers.
The read_keys.py script can read stash files.
Currently CRLite uses a heuristic that end-users will collect stashes until the total size of the collected stashes is going to be larger than a new filter. At that point, the infrastructure will switch over to a new filter and clear all existing stashes.
The contract between CRLite clients and the infrastructure allows the infrastructure to adjust this heuristic at will. Most likely, this will be modified over time to optimize client-side searches, as searching the stashes is slower than searching the Bloom filter cascade, and purely choosing to update the filter on file-size does not account for those speed differences.
Mozilla monitors all the logs listed in Google's main list. There's a script in the repository, list_all_active_ct_logs, which can parse Google's list, but the actual production CRLite entry uses all logs in list without filtering, and is periodically updated. Issue #144 covers the idea of loading Google's list during ct-fetch startup as an optional step.
ct-fetch stores certificate serial numbers and CRL distribution points in the Redis database.
Serial numbers are stored as Redis sets with the keys being named in the form serials::<expiration date and hour>::<issuer>
, with each key's expiration set to automatically expunge upon reaching the expiration day-and-hour.
CRL distribution points are also stored as Redis sets, with keys in the form crls::<issuer>
, and CRL DPs do not expire; as they are discovered, CRLite assumes they will be updated until the retirement of the issuer.