java.lang.OutOfMemoryError: Unable to acquire 100 bytes of memory, got 0

sparksql java lang outofmemoryerror java heap space
databricks java lang outofmemoryerror java heap space
spark coalesce out of memory
spark exception in thread main'' java lang outofmemoryerror java heap space
coalesce vs repartition
how to handle memory error in spark
pyspark memory issues
java out of heap space spark

I'm invoking Pyspark with Spark 2.0 in local mode with the following command:

pyspark --executor-memory 4g --driver-memory 4g

The input dataframe is being read from a tsv file and has 580 K x 28 columns. I'm doing a few operation on the dataframe and then i am trying to export it to a tsv file and i am getting this error.

df.coalesce(1).write.save("sample.tsv",format = "csv",header = 'true', delimiter = '\t')

Any pointers how to get rid of this error. I can easily display the df or count the rows.

The output dataframe is 3100 rows with 23 columns

Error:

Job aborted due to stage failure: Task 0 in stage 70.0 failed 1 times, most recent failure: Lost task 0.0 in stage 70.0 (TID 1073, localhost): org.apache.spark.SparkException: Task failed while writing rows
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Unable to acquire 100 bytes of memory, got 0
    at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:129)
    at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
    at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
    at org.apache.spark.sql.execution.WindowExec$$anonfun$15$$anon$1.fetchNextRow(WindowExec.scala:300)
    at org.apache.spark.sql.execution.WindowExec$$anonfun$15$$anon$1.<init>(WindowExec.scala:309)
    at org.apache.spark.sql.execution.WindowExec$$anonfun$15.apply(WindowExec.scala:289)
    at org.apache.spark.sql.execution.WindowExec$$anonfun$15.apply(WindowExec.scala:288)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:766)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:766)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:96)
    at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:95)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:253)
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1325)
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258)
    ... 8 more

Driver stacktrace:

The problem for me was indeed coalesce(). What I did was exporting the file not using coalesce() but parquet instead using df.write.parquet("testP"). Then read back the file and export that with coalesce(1).

Hopefully it works for you as well.

java.lang.OutOfMemoryError: Unable to acquire bytes of memory, Executor: Exception in task 77.0 in stage 273.0 (TID 46870) java.lang.​OutOfMemoryError: Unable to acquire 128 bytes of memory, got 0 at with 10s of jobs and 100K+ tasks and my app fails with the exception below. Spark 1.6.0: I have a spark application ( with 5 sql joins with some filtering), which is giving an error: java.lang.OutOfMemoryError: Unable to acquire 356 bytes of memory, got 0 But when I run this with 1000 shuffle partitions, it is running fine. 'SET spark.sql.shuffle.partitions = 1000


I believe that the cause of this problem is coalesce(), which despite the fact that it avoids a full shuffle (like repartition would do), it has to shrink the data in the requested number of partitions.

Here, you are requesting all the data to fit into one partition, thus one task (and only one task) has to work with all the data, which may cause its container to suffer from memory limitations.

So, either ask for more partitions than 1, or avoid coalesce() in this case.


Otherwise, you could try the solutions provided in the links below, for increasing your memory configurations:

  1. Spark java.lang.OutOfMemoryError: Java heap space
  2. Spark runs out of memory when grouping by key

Solved: Spark Sql error : Unable to acquire 1048576 bytes , Spark Sql error : Unable to acquire 1048576 bytes of memory. Solved Go SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: at java.​lang.Thread.run(Thread.java:745). Reply. 7,509 Views. 0 Kudos. 0. Tags (3) Thanks @Brandon Wilson, I got it to work by increasing the driver memory. in my case. task failed with below log: java.lang.OutOfMemoryError: Unable to acquire 35 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage


In my case replacing the coalesce(1) with repartition(1) Worked.

"java.lang.OutOfMemoryError:Unable to acquire 262144 bytes of , OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0 at allocateArray(MemoryConsumer.java:100) at org.apache.spark.unsafe.map. java.lang.OutOfMemoryError: Unable to acquire 28 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:127) at org.apache


As was stated in other answers, use repartition(1) instead of coalesce(1). The reason is that repartition(1) will ensure that upstream processing is done in parallel (multiple tasks/partitions), rather than on only one executor.

To quote the Dataset.coalesce() Spark docs:

However, if you're doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1). To avoid this, you can call repartition(1) instead. This will add a shuffle step, but means the current upstream partitions will be executed in parallel (per whatever the current partitioning is).

OutOfMemoryError when running distinct() after K-means · Issue , allocateArray(MemoryConsumer.java:100) at Executor$TaskRunner.run(​Executor.scala:282) at java.util.concurrent. OutOfMemoryError: Unable to acquire 16384 bytes of memory, got 0 OutOfMemoryError when running  Attachments: Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total. Follow this Question 24 People are following this question.


In my case the driver was smaller than the workers. Issue was resolved by making the driver larger.

[#SPARK-18443] spark leak memeory and led to OOM, task failed with below log: java.lang.OutOfMemoryError: Unable to acquire 35 bytes of memory, got 0 at org.apache.spark.memory. Job failure due to Executor OOM in offheap mode java.lang.OutOfMemoryError: Unable to acquire 1220 bytes of memory, got 0 at org.apache.spark.memory


java.lang.OutOfMemoryError: Unable to acquire 100 bytes , OutOfMemoryError: Unable to acquire 100 bytes of memory, got 0. 4 file and export that with coalesce(1) . Hopefully it works for you as well. 0. java.lang.OutOfMemoryError: Unable to acquire 76 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer The issue is that there is a memory leak in the


How to resolve our of memory error?, SparkOutOfMemoryError: Unable to acquire 262144 bytes of memory, got 65536 BytesToBytesMap.allocate(BytesToBytesMap.java:812)  Some tasks failed with Unable to acquire memory. Log In. 1.5.0. Fix Version/s: None Unable to acquire 2097152 bytes of memory at org.apache.spark.util


java.lang.OutOfMemoryError: Unable to acquire 100 bytes , java.lang.OutOfMemoryError: Unable to acquire 100 bytes of memory, got 0. Question. I'm invoking Pyspark with Spark 2.0 in local mode with the following  15/12/01 15:06:14 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 83.0 KB, free 10.8 GB) 15/12/01 15:06:14 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.88.67.113:46515 (size: 83.0 KB, free: 10.8 GB) 15/12/01 15:06:14 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler