SSL certificates aren’t the most interesting dataset, but there are still some interesting lessons to learn from a gathering a big list of them.
Prior to this project, it never occurred to me that the “5” in “MD5” would have different values. In this data set I found a handful of sites that use MD2 as part of the “Signature algorithm” bit of the certificate (some MD5 as well). I don’t know whether this means they are vulnerable to attack, but in both cases it certainly indicates they are dependent on old technology.
CloudFlare is popular now as a CDN (because of their free plan and free SSL). Their certificates actually come from “COMODO RSA Extended Validation Secure Server CA” as the root certificate. The actual CloudFlare certificates are SNI certificates (i.e. they work for a list of many domains), and I found about 14,882 overall in the dataset (18%).
I did not find any Letsencrypt certificates. It may be that this is because they are a newer service, but it may also be that I scanned for these and the JDK I used rejected them (I did not retain any it considered invalid).
I did notice a fair number of Chinese certificates (~560). One thing I was pleased about is that all parts of the infrastructure apparently support Unicode without any effort on my part, I can see Kanji coming through cleanly.
There is a notorious problem where newer GoDaddy certificates don’t work in older JDKs1. If I search for SHA-1, this turns up a few root certifications that also use SHA-1 (e.g. there are Facebook and Verizon certificates that might turn up the same problem).
Gandi certificates were surprisingly unpopular – 1% of the entire dataset. After GoDaddy’s Elephant controversy a fair number of people moved their sites to Namecheap and Gandi, but this may also be a reflection of where I obtained the domains.
As the amount of software that is deployed on the internet grows, interest in encryption and other security technologies will only grow. Having the ability to scan for problems is one way for researchers to track the rate that security fixes are deployed across the internet, and may even have some unconventional uses. For instance, there was a period of time where SEO specialists would find mistakes in people’s websites,and email the owners in the hope of rewards (i.e. links). As the cost of security problems grows, and awareness of the legal issues, many more companies are offering bug bounties, and having scanning tools is one step in improving the state of the internet.
Other essays in this series
- Part 1: Project Introduction
- Part 2: Lessons from the UI
- Part 3: Acquiring Data
- Part 4: Devops lessons
- Part 6: A look forward