Java: Memory efficient ByteArrayOutputStream

bytearrayoutputstream java 8
bytearrayoutputstream android
bytearrayoutputstream close
bytearrayoutputstream tostring
bytearrayinputstream and bytearrayoutputstream in java
java io bytearrayoutputstream hugecapacity outofmemoryerror
bytearrayoutputstream scala
byte array to bytearrayoutputstream

I've got a 40MB file in the disk and I need to "map" it into memory using a byte array.

At first, I thought writing the file to a ByteArrayOutputStream would be the best way, but I find it takes about 160MB of heap space at some moment during the copy operation.

Does somebody know a better way to do this without using three times the file size of RAM?

Update: Thanks for your answers. I noticed I could reduce memory consumption a little telling ByteArrayOutputStream initial size to be a bit greater than the original file size (using the exact size with my code forces reallocation, got to check why).

There's another high memory spot: when I get byte[] back with ByteArrayOutputStream.toByteArray. Taking a look to its source code, I can see it is cloning the array:

public synchronized byte toByteArray()[] {
    return Arrays.copyOf(buf, count);
}

I'm thinking I could just extend ByteArrayOutputStream and rewrite this method, so to return the original array directly. Is there any potential danger here, given the stream and the byte array won't be used more than once?

MappedByteBuffer might be what you're looking for.

I'm surprised it takes so much RAM to read a file in memory, though. Have you constructed the ByteArrayOutputStream with an appropriate capacity? If you haven't, the stream could allocate a new byte array when it's near the end of the 40 MB, meaning that you would, for example, have a full buffer of 39MB, and a new buffer of twice the size. Whereas if the stream has the appropriate capacity, there won't be any reallocation (faster), and no wasted memory.

A Memory Efficient and Fast Byte Array Builder Implementation , I was specifically facing "GC overhead limit exceeded" and "Java heap space" errors, while running a test with fixed max heap size. Can we  I've got a 40MB file in the disk and I need to "map" it into memory using a byte array. At first, I thought writing the file to a ByteArrayOutputStream would be the best way, but I find it takes a

ByteArrayOutputStream should be okay so long as you specify an appropriate size in the constructor. It will still create a copy when you call toByteArray, but that's only temporary. Do you really mind the memory briefly going up a lot?

Alternatively, if you already know the size to start with you can just create a byte array and repeatedly read from a FileInputStream into that buffer until you've got all the data.

Java - ByteArrayOutputStream, Java - ByteArrayOutputStream - The ByteArrayOutputStream class stream creates a buffer in memory and all the data sent to the stream is stored in the buffer. Java - ByteArrayOutputStream - The ByteArrayOutputStream class stream creates a buffer in memory and all the data sent to the stream is stored in the buffer.

If you really want to map the file into memory, then a FileChannel is the appropriate mechanism.

If all you want to do is read the file into a simple byte[] (and don't need changes to that array to be reflected back to the file), then simply reading into an appropriately-sized byte[] from a normal FileInputStream should suffice.

Guava has Files.toByteArray() which does all that for you.

ByteArrayOutputStream (Apache Commons IO 2.7-SNAPSHOT API), java.io.OutputStream. org.apache.commons.io.output.ByteArrayOutputStream In contrast to the original it doesn't reallocate the whole memory block but  public class ByteArrayOutputStream extends OutputStream This class implements an output stream in which the data is written into a byte array. The buffer automatically grows as data is written to it.

For an explanation of the buffer growth behavior of ByteArrayOutputStream, please read this answer.

In answer to your question, it is safe to extend ByteArrayOutputStream. In your situation, it is probably better to override the write methods such that the maximum additional allocation is limited, say, to 16MB. You should not override the toByteArray to expose the protected buf[] member. This is because a stream is not a buffer; A stream is a buffer that has a position pointer and boundary protection. So, it is dangerous to access and potentially manipulate the buffer from outside the class.

Java: Memory efficient ByteArrayOutputStream, Java: Memory efficient ByteArrayOutputStream. Question. I've got a 40MB file in the disk and I need to "map" it into memory using a byte array. At first, I thought  java.io.ByteArrayOutputStream class creates an Output Stream for writing data into byte array. The size of buffer grows automatically as data is written to it. There is no affect of closing the byteArrayOutputStream on the working of it’s methods. They can be called even after closing the class.

If you have 40 MB of data I don't see any reason why it would take more than 40 MB to create a byte[]. I assume you are using a growing ByteArrayOutputStream which creates a byte[] copy when finished.

You can try the old read the file at once approach.

File file = 
DataInputStream is = new DataInputStream(FileInputStream(file));
byte[] bytes = new byte[(int) file.length()];
is.readFully(bytes);
is.close();

Using a MappedByteBuffer is more efficient and avoids a copy of data (or using the heap much) provided you can use the ByteBuffer directly, however if you have to use a byte[] its unlikely to help much.

ByteArrayOutputStream (Java Platform SE 7 ), Creates a new byte array output stream, with a buffer capacity of the specified size, in bytes. Method Summary. Methods. Modifier and Type, Method and  You could use a memory mapped buffer. You could give a size hint when you allocate the ByteArrayOutputStream; e.g. ByteArrayOutputStream baos = ByteArrayOutputStream(file.size()); You could dispense with the ByteArrayOutputStream entirely and read directly into a byte array.

Optimizing Java code by using packed objects, In the Java memory model, all fields in an object are either a primitive data type, such as byte or Consider a byte array that contains a single element: Modern hardware relies heavily on caching and prefetching to provide efficient access. (byte[] b) Writes b.length bytes from the specified byte array to this output stream. void. (byte[] b, int off, int len) Writes len bytes from the specified byte array starting at offset off to this output stream. abstract void. (int b) Writes the specified byte to this output stream.

Consider replace ByteArrayOutputStream and access buffer directly , #124, directly using ByteArrayOutputStream and call toByteArray() may not be very efficient, as it needs to copy memory to different location. Direct Known Subclasses: ByteArrayOutputStream, FileOutputStream, FilterOutputStream, ObjectOutputStream, OutputStream, PipedOutputStream. public abstract class OutputStream extends Object implements Closeable, Flushable This abstract class is the superclass of all classes representing an output stream of bytes.

FastByteArrayOutputStream (Spring Framework 5.2.5.RELEASE API), java.io.OutputStream. org.springframework.util.FastByteArrayOutputStream Also replaces the internal structures with the byte array to conserve memory: if the  Java.io.ByteArrayOutputStream.toByteArray() Method - The java.io.ByteArrayOutputStream.toByteArray() method creates a newly allocated buffer with the size as the current size of this output stream.

Comments
  • Similar question stackoverflow.com/questions/964332/…
  • Thanks for your answer. I tried setting the appropiate capacity, and the result was the same. For this, I would prefer something based on streams, as it would be interesting for me to apply some filters. Nevertheless, if there is no other way, I'd try to use those MappedByteBuffers.
  • Yes, it's temporary, but I prefer not to use so much memory. I don't know how big are some files going to be, and this may be used in small machines, so I try to use as little memory as possible.
  • @user683887: Then how about creating the second alternative I presented? That will only require as much data as is required. If you need to apply filters, you could always read the file twice - once to work out what size you need, then again to actually read the data.
  • Guava is the best choise for this problem. Thanks.
  • Thanks, @Stephen. You were right, the additional heap usage was due to an incorrect initialization of BAOS size, as I described in my update. I'm using visualvm for measuring memory usage: not sure if it's the best approach.