{"id":5306,"date":"2016-11-28T01:30:10","date_gmt":"2016-11-28T01:30:10","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=5306"},"modified":"2016-11-28T01:30:10","modified_gmt":"2016-11-28T01:30:10","slug":"scala-recursively-calling-span","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/scala-recursively-calling-span\/","title":{"rendered":"Scala: recursively calling span"},"content":{"rendered":"<p>The span function in scala uses a predicate to break a list into two parts: the first elements matching the predicate, and the rest of the list.<\/p>\n<p>If you call this recursively, you can break the list up into pieces. <\/p>\n<p>In this case, we&#8217;ll take a list of tuples, which are words in a sentence, and their part of speech pairing.<\/p>\n<p>For words labeled as proper nouns, we want to find all the contiguous proper noun pairings, and print them out.<\/p>\n<pre lang=\"scala\">\ntype WordPOS = (String, String)\ntype TaggedSentence = List[WordPOS]\ntype GroupedPOS = List[TaggedSentence]\n\ndef outersplitter(results: TaggedSentence): GroupedPOS = {\n  type SideBySideSentence = List[(WordPOS, WordPOS)]\n\n  val cmp: SideBySideSentence =\n    results.zip(\n      (\"\", \"\") :: results\n    )\n\n   def removeIndex( x:<sup><a href=\"#footnote_0_5306\" id=\"identifier_0_5306\" class=\"footnote-link footnote-identifier-link\" title=\"WordPOS, WordPOS), Int\">1<\/a><\/sup> = {\n    x._1\n  }\n\n  def splitter(results: SideBySideSentence): GroupedPOS = {\n    def next(cmp: List[<sup><a href=\"#footnote_1_5306\" id=\"identifier_1_5306\" class=\"footnote-link footnote-identifier-link\" title=\"WordPOS, WordPOS), Int)]) = {\n      val res = cmp.span(\n        (pairings) =&gt; (pairings._1._1._1 == pairings._1._2._1 || pairings._2 == 0)\n      )\n\n      Tuple2(res._1.map(removeIndex), res._2.map(removeIndex\">2<\/a><\/sup>\n    }\n\n    val iter = next(results.zipWithIndex)\n    val first = iter._1.map(_._1)\n\n    val second: GroupedPOS =\n      iter._2 match {\n        case List() => {\n          List[TaggedSentence]()\n        }\n        case _ => {\n          val nextList = results.drop(iter._1.size)\n          splitter(nextList)\n        }\n      }\n\n    first :: second\n  }\n\n  splitter(cmp)\n}\n\nval sentencePairings = outersplitter(results)\n\nval possiblePeople =\n  sentencePairings.filter(\n    (aGrouping) => {\n      aGrouping.head._1 == \"NNP\"\n    }\n  ).map(\n    _.map(\n      _._2\n    ).mkString(\" \")\n  )\n<\/pre>\n<p>The real pain of the whole thing is that you need to track if you&#8217;re at the head of the list or not, when doing the span. <\/p>\n<p>If you can find a way to make this more compact, please let me know!<\/p>\n<ol class=\"footnotes\"><li id=\"footnote_0_5306\" class=\"footnote\"> WordPOS, WordPOS), Int <span class=\"footnote-back-link-wrapper\"> [<a href=\"#identifier_0_5306\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/span><\/li><li id=\"footnote_1_5306\" class=\"footnote\">WordPOS, WordPOS), Int)]) = {\n      val res = cmp.span(\n        (pairings) => (pairings._1._1._1 == pairings._1._2._1 || pairings._2 == 0)\n      )\n\n      Tuple2(res._1.map(removeIndex), res._2.map(removeIndex<span class=\"footnote-back-link-wrapper\"> [<a href=\"#identifier_1_5306\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/span><\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>Doing a &#8216;span&#8217; in Scala to group matching continuous blocks of values in a list by a predicate<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[4],"tags":[385,480],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5306"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=5306"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5306\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=5306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=5306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=5306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}