{"id":1824,"date":"2013-08-23T12:34:10","date_gmt":"2013-08-23T12:34:10","guid":{"rendered":"http:\/\/garysieling.com\/blog\/?p=1824"},"modified":"2013-08-23T12:34:10","modified_gmt":"2013-08-23T12:34:10","slug":"importing-data-from-solr-to-postgres-with-scala","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/importing-data-from-solr-to-postgres-with-scala\/","title":{"rendered":"Importing Data from Solr to Postgres with Scala"},"content":{"rendered":"<p>I suspect most people who set up Solr indexes pull data from a second system into Solr; having written a previous example where I <a href=\"http:\/\/garysieling.com\/blog\/converting-git-commit-history-to-a-solr-full-text-index\">pulled git data into a Solr index<\/a>, I copied this data into Postgres to allow comparing the behavior of two full-text indexers.<\/p>\n<p>This is a fairly simple process if you make a Postgres table that has columns named the same as Postgres &#8211; in my case I set them all to &#8220;character varying&#8221;, in the interest of avoiding type conversion issues. If the dataset isn&#8217;t too large, you can fix the types later this way (mine is about ~1.6 million rows). This streams the results from Solr, so that you don&#8217;t have to worry about memory (while this runs, it does creep up slowly, perhaps some work building up for the garbage collector)<\/p>\n<p>The code that results from this is pretty simple &#8211; very similar to the Java equivalent, but more compact and easier to read. One of the awesome things about Scala is that merely importing the JavaConversions namespace fixed a lot of autoboxing issues between Java and Scala, with the exception of the &#8220;Float&#8221; in streamDocListInfo.<\/p>\n<pre lang=\"Scala\">\nimport java.io._\nimport java.sql._\n\nimport scala.collection.JavaConversions._\nimport collection.immutable._\n\nimport org.apache.solr.client.solrj._\nimport org.apache.solr.client.solrj.impl._\nimport org.apache.solr.common._\nimport org.apache.solr.common.params._\n\nobject SolrToPostgres extends App {\n  val server = new HttpSolrServer(\n      \"http:\/\/localhost:8080\/solr\/collection1\")\n\n  val params = HashMap(  \n    \"collectionName\" -> \"collection1\",\n    \"q\" -> \"*:*\",\n    \"start\" -> \"0\",\n    \"rows\" -> Int.MaxValue.toString)\n    \n  val solrParams = new MapSolrParams(params)\n  \n  val connection = DriverManager.getConnection(\n      \"jdbc:postgresql:\/\/127.0.0.1:5432\/github\", \"postgres\", \"\")\n\n  val query = \"DELETE FROM data\"\n  val delete = connection.createStatement\n  delete.execute(query)\n  \n  val callback = new StreamingResponseCallback() {\n    def streamDocListInfo(arg0: Long, \n                          arg1: Long, \n                          arg2: java.lang.Float) = {\n      \n    }\n\n    def streamSolrDocument(doc: SolrDocument) = {\n      val query =\n        \"INSERT INTO \" +\n        \"data   (author, id, email, company, date, message) \" +\n        \"VALUES (?,      ?,  ?,     ?,       ?,    ?,     )\"\n      \n      try {\n        val s: PreparedStatement = connection.prepareStatement(query)\n        s.setString(1, doc.getFieldValue(\"author\").toString)\n        s.setString(2, doc.getFieldValue(\"id\").toString)\n        s.setString(3, doc.getFieldValue(\"email\").toString)\n        s.setString(4, doc.getFieldValue(\"company\").toString)\n        s.setString(5, doc.getFieldValue(\"date\").toString)\n        s.setString(6, doc.getFieldValue(\"message\").toString)\n        s.setString(7, doc.getFieldValue(\"name\").toString)\n        s.setString(8, doc.getFieldValue(\"github\").toString)\n        s.setString(9, doc.getFieldValue(\"search\").toString)\n        \n        s.execute\n        \n        s.close\n      }\n      catch\n      {\n        case e: Exception =>\n        println(e.getMessage)\n        println(e.getStackTrace)\n      }\n    } \n  }\n  \n  server.queryAndStreamResponse(solrParams, callback)\n  connection.close\n}\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I suspect most people who set up Solr indexes pull data from a second system into Solr; having written a previous example where I pulled git data into a Solr index, I copied this data into Postgres to allow comparing the behavior of two full-text indexers. This is a fairly simple process if you make &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.garysieling.com\/blog\/importing-data-from-solr-to-postgres-with-scala\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Importing Data from Solr to Postgres with Scala&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[4],"tags":[300,437,480,517],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1824"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=1824"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1824\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=1824"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=1824"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=1824"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}