Catching MalformedInputException during scala.io.Source.fromFile

I'm using the scala.io.Source.fromFile method to read a csv file. Sometimes the file will be encoded in a different encoding format. I'll allow the user to specify the file enconding but...if the user doesn't specify the proper encoding I'd like to catch the MalformedInputException and then my method will return a None (instead of Some[Iterator[String]]).

I'm using the onCodingException method of the Codec but it seems that is not get applied. See below my code:

def readFileAsIterator(fileName: String,
                     encoding: Option[String] = Some(defaultEncoding)): Option[Iterator[String]] = {
try {
  val codecType = encoding.getOrElse(defaultEncoding)
  implicit val codec = Codec(codecType)
  codec.onCodingException {
    case e: CharacterCodingException =>  {
      throw (new MalformedInputException(2))
    }
  }
  val fileLines = io.Source.fromFile(fileName)(codec).getLines()
  Some(fileLines)
} catch {
  case e: Exception => {
    None
  }
}
}

Someone has played around with this method and managed to make it work?

There are two things which you should think about modifying here,

1 - Return a Try[Iterator[String]] instead of Option[Iterator[String]]

2 - encoding can be a String with a default value.

def readFileAsIterator(fileName: String, encoding: String = "UTF-8"): Try[Iterator[String]] = Try({
  implicit val codec = Codec(encoding)
  codec.onCodingException({
    case e: CharacterCodingException => 
      throw (new MalformedInputException(2))
  })
  io.Source.fromFile(fileName)(codec).getLines()
})

skipping binary files with Source.fromFile, I'm parsing a directory recursively with Source.fromFile. Some files in fromFile throws a MalformedInputException. val s = try { scala.io.Source.fromFile("/var/ log/asl/2014.04.08.G80.asl") } catch { case _: java.nio.charset.

This

io.Source.fromFile(fileName)(codec).getLines()

returns Iterator[String] which is lazy. So exception happens on iterating, not immediately on iterator creation. Think, in general case it is not possible to detect wrong encoding without parsing before, so you need either parse file first to understand if encoding is right and than return new created iterator (not one used for parsing!), or leave exception handling to caller code, which parses data. Or kind of trade-off, e.g. read several first lines, if ok (no coding exceptions) create new iterator for caller, but understand that in some cases caller will get exception on later wrong encoding line.

Update

Response to your comment to me under another answer.

Check this:

def readFileAsIterator(fileName: String,
                     encoding: Option[String] = Some("IBM1098"),
                     touchIterator: Boolean = false): Option[Iterator[String]] = {
try {
  val codecType = encoding.getOrElse("IBM1098")
  implicit val codec = Codec(codecType)
  codec.onCodingException {
    case e: CharacterCodingException =>  {
      throw new MalformedInputException(2)
    }
    case e: java.nio.charset.UnmappableCharacterException =>  {
      throw new MalformedInputException(3)
    }
  }
  if (!touchIterator) {
    Some(scala.io.Source.fromFile(fileName)(codec).getLines())
  } else {
    val i = scala.io.Source.fromFile(fileName)(codec).getLines()
    if (i.hasNext) {
      Some(i)
    } else {
      None
    }
  }
} catch {
  case e: Exception => {
    log.info(s"Handled exception in func", e)
    None
  }
}
}

Two calls on file which cause exception (in my case it was UnmappableCharacterException) with touching iterator and without depending on additional argument.

Under the hood you have iterator as I said. It is lazy buffered iterator. So it is initialized on first call (in modified method I force to initialize it with hasNext). I do not think that it reads whole file, just buffer part of it (so it is automated implementation of my "trade-off case").

Reading from File Fails scala.io.Source.fromFile - Scala, scala -e 'println(scala.io.Source.fromFile("error").mkString)'. java.nio.charset. MalformedInputException: Input length = 1. at java.nio.charset.

Had same error. I handled it using onMalformedInput() as shown below:

implicit val codec = Codec("UTF-8")       
codec.onMalformedInput(CodingErrorAction.REPLACE)  
codec.onUnmappableCharacter(CodingErrorAction.REPLACE)
for(line <- Source.fromFile("..").getLines()) {
  ...
}

How to open and read text files in Scala, This is Recipe 12.1, “How to open and read a text file in Scala. leaves the file open for (line <- io.Source.fromFile("/etc/passwd"). you try to open a file, and if you want to handle your exceptions, use Scala's try/catch syntax:

"MalformedInputException: Input length = 1 " fix, scala.io.Source: For example. Source. fromFile(filename)("UTF-8"). foreach( print);. throws: java.nio.charset.MalformedInputException: Input length = 1 at java. nio.charset. BufferedReader.read(BufferedReader.java:174) at scala.io. toString(); } catch (FileNotFoundException e) { e.printStackTrace(); } 

How to resolve java.nio.charset.UnmappableCharacterException in , length = 1 " exception. For scala code i managed to find a very easy solution. toString(); } catch (FileNotFoundException e) { e.printStackTrace() Source) at java.io.InputStreamReader.read(Unknown Source) at java.io.

scalafx.collections.ObservableBuffer Scala Example, BufferedReader.read(BufferedReader.java:157) at scala.io. hasNext(Iterator. scala:320) at scala.io.Source.hasNext(Source.scala:209) at scala.collection. in RoutesFileParser.scala below here: MalformedInputException: Input length = 1 at java.nio.charset. def fromFile(file: JFile)(implicit codec: Codec): BufferedSource .

Comments
  • Thank you for replying Evgeny. At first I also thought that it should behave as you explained but I'm afraid that the method does not behave that way. Debugging the code I found out that when this method is called and the encoding is wrong the iterator is filled with the exception and when you try to read the iterator you will get that exception (I guess the method does a first check on the file encoding)
  • Looks like you answered to me, but with comment to another answer. I updated my answer.