Data Warehousing, NoSQL, and the Cloud

With the nascent advent of NoSql, cloud computing and slick new databases, we seem to have forgotten from whence we came. I went to a conference recently on the open source search product Solr/Lucene. One of the keynote speakers, Chief Data Scientist of HortonWorks, discussed what turned him to NoSQL databases, in this case, a failed project to track every click on walmart.com in Oracle.

For all it’s idiosyncrasies and irritations Oracle (the database) is an incredibly powerful and versatile product, a power most projects do not fully use. Hortonworks appears to be trying to follow the same path as Oracle (the company), from consulting company product to vast riches. Even though many projects do not tap full featureset or power of Oracle, it is still preferred in some companies for the supposed safety of an expensive support contract. This is probably true of SQL Server as well, but I’m less experienced in that area. In the same way, I doubt few who use NoSQL solutions fully realize the power or place to use the tools available.

I think it’s worth looking at how we got here. Oracle has traditionally sells single purpose, high powered and very expensive machines, a poor fit for a scrappy web startup. The variety of configuration and installation options is overwhelming to the point that they sell pre-built HP boxes with Oracle and the OS configured for you.

Through a maze of acquisitions Oracle likely owns a company that can meet any need, if only you can figure out where to look on their website. When I last talked to their sales reps for a non database product, their preferred pricing model was revenue sharing, which to me sounds like a terrible proposal, unless you own a company that exists to lose money.

If you pay enough, Oracle will assign someone to fix your problems. When I last worked on a data warehouse, I typically found a database defect every other week, some with patches available and some without. We were running a “small” data warehousing system, recording a couple hundred million records.
Other companies in the city were well known to have larger databases,for various purposes. Had I wished for a different rendition of this project, I could well have moved to a payroll company, a grocery chain, or a computer manufacturer.

The challenge of building such a system is not unique to Oracle. Tuning queries on Postgres, in my experience, typically results in two orders of magnitude performance improvement, vs. one in Oracle. This appears to be Postgres lacking numerous micro-optimizations, while generally a solid, cleanly-designed product.

If you find this interesting and read this far- stay tuned for more. You can subscribe using RSS or the email subscription in the upper right.