{"id":3690,"date":"2016-04-22T11:47:56","date_gmt":"2016-04-22T11:47:56","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=3690"},"modified":"2016-04-22T11:47:56","modified_gmt":"2016-04-22T11:47:56","slug":"ssl-certificate-search-abstract","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/ssl-certificate-search-abstract\/","title":{"rendered":"Project Write-up: SSL Certificate Search Engine (Part 1: Abstract)"},"content":{"rendered":"<p>This is the first of several essays on an ongoing research project: building a search engine to <a href=\"https:\/\/www.garysieling.com\/search\/ssl_search\">list X.509 (SSL) certificates<\/a>.  The first iteration of this project was a tool to search code to search for engineers by their area of expertise<sup><a href=\"#footnote_0_3690\" id=\"identifier_0_3690\" class=\"footnote-link footnote-identifier-link\" title=\"https:\/\/www.garysieling.com\/blog\/discovering-senior-developers-from-source-code-history\">1<\/a><\/sup>, and a tool to search several cloud storage services at once (DropBox, OneDrive, and Google Docs). <\/p>\n<p>There are a ton of free analysis tools for looking at SSL certificates, but they are nearly worthless in a corporate environment because they can&#8217;t hit internal services. The value of a search engine is to find domains with certificates like one you have, and to find out how often features age out. SSL, for instance, being replaced by TLS.<\/p>\n<p>To find websites with HTTPS enabled, I started with a list of around 1 million domains and pulled the certificate for each from the &#8220;www&#8221; subdomain (note you could have different certificates on each subdomain). Certificates are well structured text file with lots of attributes, which allowed me to experiment with the UI for facets.<\/p>\n<p>There are a couple generic user interfaces for search engines, although most for ElasticSearch, rather than Solr. The most well-known is Kibana<sup><a href=\"#footnote_1_3690\" id=\"identifier_1_3690\" class=\"footnote-link footnote-identifier-link\" title=\"https:\/\/www.elastic.co\/products\/kibana\">2<\/a><\/sup>. Kibana seems to be a  clear a derivative of the log monitoring tool Splunk<sup><a href=\"#footnote_2_3690\" id=\"identifier_2_3690\" class=\"footnote-link footnote-identifier-link\" title=\"http:\/\/www.splunk.com\/\">3<\/a><\/sup>. There is also a tool called SearchKit<sup><a href=\"#footnote_3_3690\" id=\"identifier_3_3690\" class=\"footnote-link footnote-identifier-link\" title=\"http:\/\/www.searchkit.co\/\">4<\/a><\/sup> which looks nice, and appears to be more of a UI control library.<\/p>\n<p>The principle goals of this iteration of the project are to set up features to support discovery by search engines, and replicate the facet behavior used by Amazon and Newegg &#8211; in these two applications, the search facets represent a taxonomy of data. Each filter has checkboxes for different values (e.g. for hard drives, they have size ranges).  When you filter by one value (say &#8220;0-500 GB&#8221;) it filters the search results, but still allows you to add more values (&#8220;500-1000 GB&#8221; etc). While conceptually simple, this ended up requiring a lot of fiddling with the UI to get it to work conistently. <\/p>\n<p>The principle search experiment in this iteration is to build in a &#8220;detail&#8221; concept to the rendering, so that you get an indexable profile page for each certificate. I set up server side rendering with Node + React, and got Cloudflare in front (more effort than I expected but not a huge list of problems). To get Google to index these pages, you need to give them a sitemap, which is basically an XML file with URLs and how often you expect them to change &#8211; up to 50,000 URLs per file.<\/p>\n<p>In the essays in this series that follow, I will discuss what I learned from this iteration of this research project, and what I think the next areas to research are. Roughly speaking, this will be divided into UI considerations, data loading lessons, operations, and what I learned from poking around the actual dataset. I suspect there is opportunity to build search tools that promote niche items based on some &#8220;interestingness&#8221; measures&#8221;, and so some future research will hopefully start uncovering novel concepts. Since this project is effectively &#8220;done&#8221;, <a href=\"https:\/\/www.garysieling.com\/search\/ssl_search\">you can also poke around with what I have.<\/a><\/p>\n<h3>Other essays in this series<\/h3>\n<ul>\n<li><a href=\"https:\/\/www.garysieling.com\/blog\/project-write-ssl-certificate-search-engine-part-2-ui-lessons\">Part 2: Lessons from the UI<\/a><\/li>\n<li><a href=\"https:\/\/www.garysieling.com\/blog\/project-write-ssl-certificate\">Part 3: Acquiring Data<\/a><\/li>\n<li><a href=\"https:\/\/www.garysieling.com\/blog\/project-write-ssl-certificate-search-engine-part-4-devops\">Part 4: Devops lessons<\/a><\/li>\n<li><a href=\"https:\/\/www.garysieling.com\/blog\/project-write-ssl-certificate-search-engine-part-5-future-2\">Part 5: A look at the data<\/a><\/li>\n<li><a href=\"https:\/\/www.garysieling.com\/blog\/project-write-ssl-certificate-search-engine-future\">Part 6: A look forward<\/a>\n<\/ul>\n<ol class=\"footnotes\"><li id=\"footnote_0_3690\" class=\"footnote\">https:\/\/www.garysieling.com\/blog\/discovering-senior-developers-from-source-code-history<span class=\"footnote-back-link-wrapper\"> [<a href=\"#identifier_0_3690\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/span><\/li><li id=\"footnote_1_3690\" class=\"footnote\">https:\/\/www.elastic.co\/products\/kibana<span class=\"footnote-back-link-wrapper\"> [<a href=\"#identifier_1_3690\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/span><\/li><li id=\"footnote_2_3690\" class=\"footnote\">http:\/\/www.splunk.com\/<span class=\"footnote-back-link-wrapper\"> [<a href=\"#identifier_2_3690\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/span><\/li><li id=\"footnote_3_3690\" class=\"footnote\">http:\/\/www.searchkit.co\/<span class=\"footnote-back-link-wrapper\"> [<a href=\"#identifier_3_3690\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/span><\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>This is the write-up for for the goals of a recently completed research projects, a search engine to explore HTTPS certificates.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[22],"tags":[302,387,498,499,517,528],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/3690"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=3690"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/3690\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=3690"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=3690"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=3690"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}