Hot questions for Using Mockito in apache spark

Top 10 Java Open Source / Mockito / apache spark

Question:

I am trying to verify an interaction with a serializable mock that I am using in a spark job. The mock is created with:

private val mockFoo = Mockito.mock(classOf[Foo], Mockito.withSettings().serializable())

I am then passing this to the constructor of my Spark job class and I am using it like this:

rdd.map(elem => foo.doSomethingWith(elem))

everything works fine, doSomethingWith call is ignored, no interaction with the real object but Spark serializes the mock to send it to a spark worker and deserializes it again in the worker to use it. Even though I'm running the test locally spark is doing all this anyway. This serialization & deserialization means a new instance of the mock is created and that's where the interaction happens so when I'm calling verify on the original mock instance I passed, it fails saying there were no interactions with this mock.

I understand why this is happening and I tested this without spark as well by manually serializing and deserializing a mock.

My question is, is there something I can do to verify the interaction?


Answer:

Apparently broadcasting the mocked object works (example below). Thanks chuckskull :)

In the test:

val mockFoo = Mockito.mock(classOf[Foo], Mockito.withSettings().serializable())
val bd = spark.broadcast(mockFoo)
val objUnderTest = new ClassUnderTest(bd)
val rdd = objUnderTest.methodUnderTest
// rdd assertions if needed
Mockito.verify(mockFoo).doSomethingWith(elem)

Production code:

class ClassUnderTest(bd: Broadcast[Foo]) {
  def methodUnderTest(): RDD[Int] = {
    rdd.map(elem => bd.value.doSomethingWith(elem))
  }
}

Question:

I am writing unit test for some java spark program with Mockito, I got problem when I try to define the behavior of the method of a mocked object like:

when(mock.method(someRDD)).thenReturn(0);

Since RDD doesn't re-implements equals() function, the mocked behavior works only the rdd passed in the methods is the same reference of this "someRDD".

I wonder if there is any way to customize the "equals()" check behavior in Mockito in mocking methods ? or maybe I should use some other mock framework instead?


Answer:

You can write your own ArgumentMatcher to correlate between the passed argument and what you expect. Assuming it's just a straight-forward comparison between the RDD's fields, you can use Mockito's built in refEq matcher that uses reflection and just compares each field individually:

when(mock.method(refEq(someRDD))).thenReturn(0);

Question:

When trying to create a test for an application using Spark I face the following error:

java.io.InvalidClassException: java.lang.Void; local class name incompatible with stream class name "void"
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:620)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1843)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
    at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1678)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)

This only happens if I mock some classes that have void methods that would be invoked at some point during the run of the unit-under-testing.

E.g. my code is:

public class MyTest {

    private MyClass uut;

    private Writer writer;

    @Captor
    private ArgumentCaptor<Dataset<Row>> rowCaptor;

    @Before
    public void setUp() {
        initMocks(this);

        writer = mock(Writer.class);
        uut = new MyClass(writer);
    }

    @Test
    public void testSomething() {
        // given

        // when
        uut.process();

        // then
        verify(writer, times(2)).write(rowCaptor.capture());
        List<Dataset<Row>> result = rowCaptor.getAllValues();
        // ...
    }
}

Answer:

The problem seems to be in the way that Mockito serializes its internal proxy classes. That only has a negative effect if the tasks / jobs you run within spark actually get serialized and deserialized.

In org.apache.spark.scheduler.ShuffleMapTask#runTask the task is deserialized. What Spark basically does at that point is:

new JavaDeserializationStream(new ByteBufferInputStream(ByteBuffer.wrap(this.taskBinary.value())), ClassLoader.getSystemClassLoader()).objIn.readObject()

which produces the exact error message vs.

new ObjectInputStream(new ByteArrayInputStream(this.taskBinary.value())).readObject()

which would work and parse the object properly.

In particular there seems to be a mismatch between how Java / Spark expects void methods to be serialized vs. what Mockito actually does: "java.lang.Void" / "Void" vs. "void".

Luckily Mockito lets you specify the way it serializes its mocks:

MockSettings mockSettings = Mockito.withSettings().serializable(SerializableMode.ACROSS_CLASSLOADERS);
writer = mock(Writer.class, mockSettings);

After this change the test should work.


Note that verify calls for example are tricky / will not work as expected if the mock got serialized, sent to somewhere, deserialized and then used again. The invocations on the mock will not be visible to the original writer.

Question:

I'm trying to mock up a Spark context to return a mocked RDD when newAPIHadoopFile is called.

I've set it up as follows:

val mockedOuterRdd = mock[RDD[(NullWritable, MyProtobuf)]]

mockedSc.newAPIHadoopFile(anyString, anyObject(),classOf[org.apache.hadoop.io.NullWritable],
  classOf[MyProtobuf],anyObject()) returns mockedOuterRdd

This is fine in the compiler, but when I run it I get

Invalid use of argument matchers!
5 matchers expected, 3 recorded:
-> at ...
This exception may occur if matchers are combined with raw values:
    //incorrect:
    someMethod(anyObject(), "raw String");
When using matchers, all arguments have to be provided by matchers.
For example:
    //correct:
    someMethod(anyObject(), eq("String by matcher"));

Is there a way I can use something like eq(...)(which I've tried and doesn't work) with classOf[...]?

I've tried using anyObject for the classes, but it infers the type parameter for the RDD from these so they need to be right.

Thanks for reading.


Answer:

eq works for me, however I am guessing you might have left out some details of your testing framework. I used scalatest and when I tried to use eq it kept yelling at me for returning a boolean instead of a Class. However, there is a generic eq that should take over. So, the solution is to use the full path:

mockedSc.newAPIHadoopFile(anyString, anyObject(),
  org.mockito.Matchers.eq(classOf[org.apache.hadoop.io.NullWritable]), 
  org.mockito.Matchers.eq(classOf[org.apache.hadoop.io.NullWritable]), anyObject())