Scala: Filter Strings and Lists with Regexes

Scala has a neat built-in function that turns a string into a regular expression (.r). If you use it on a regular expression with slashes in it, you’ll get errors like so:

"\w+".r
:1: error: invalid escape character
       "\w+".r
         ^

Thus for many regexes, it is preferable to use the multi-line string syntax, as this skips the escaping.

scala> """\w+""".r
res43: scala.util.matching.Regex = \w+

From this, we can do some neat things, like find the first word in a sentence:

x.findFirstMatchIn("abc def")
res44: Option[scala.util.matching.Regex.Match] = Some(abc)

We can also replace all matches in the string, so if we want to swap one word for another, we can:

x.replaceAllIn("abc def", "gary")
res46: String = gary gary

You can also apply this to every value in a list, if you want to filter to just items that match. For example, here we filter the list to just values with a single word:

List("multi word", "12345", "word")
  .filter(
    "^\\w+$".r
       .findFirstIn(_)
       .isDefined)
res2: List[String] = List(12345, word)

The underscore in allows use to simplify the code, since there is only one thing we’re testing on the value (this is equivalent to writing a lambda, that looks like x => …findFirstIn(x) ).

Leave a Reply

Your email address will not be published. Required fields are marked *