When Google crawls your site, they identify themselves with a specific User Agent string:
"HTTP/1.1" 304 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
From a software architecture perspective, I didn’t build this to be easily indexable, so I was forced to provide Google with a sitemap. In the future I want to provide better URL addressability into the application, so that people can post links to specific pieces of state.
Sitemaps are limited, both by size and number of URLs – max 50,000 URLs and 10 MB. 50,000 URLs is much easier to hit. I submitted a sitemap with 40,000 URLs – I found that Google indexed approximately 5,000 per day (based on what I saw in the logs). It took about a week for these to start showing up in search results. At first, through that week a few more were visible per day, by the 10s-100s. At times, the amount would go down (presumably as this propagates through their network).
Google’s reported crawl statistics approximately match my experience watching the logs:
Google Webmaster tools shows the total amount indexed over time, which is quite interesting:
What you can see here is that it peaks at or above where I expected the number of posts to be, and then drops – this may be due to errors where the Node server crashes, but I can’t be certain.
The actual traffic from this is low, as expected; this is the lowest of low interest / long tail traffic. Despite being very competitive with other sites, it does get some traffic (~300 visitors per month).