Node.js: How to read a stream into a buffer?

convert buffer to readable stream nodejs
nodejs transform stream
node.js buffer
node stream to buffer promise
duplex stream nodejs
javascript stream
concat-stream
nodejs buffer compare

I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correctly. (everything is working, but is it the correct way?)

also, I want to understand something about the event loop, how do I know that one invocation of the function won't leak anything or change the 'buf' variable to another already running invocation (or this scenario is impossible because the callbacks are anonymous functions?)

var http = require('http');
var https = require('https');
var s3 = require('./s3');
var gm = require('gm');

module.exports.processImageUrl = function(imageUrl, filename, callback) {
var client = http;
if (imageUrl.substr(0, 5) == 'https') { client = https; }

client.get(imageUrl, function(res) {
    if (res.statusCode != 200) {
        return callback(new Error('HTTP Response code ' + res.statusCode));
    }

    gm(res)
        .geometry(1024, 768, '>')
        .stream('jpg', function(err, stdout, stderr) {
            if (!err) {
                var buf = new Buffer(0);
                stdout.on('data', function(d) {
                    buf = Buffer.concat([buf, d]);
                });

                stdout.on('end', function() {
                    var headers = {
                        'Content-Length': buf.length
                        , 'Content-Type': 'Image/jpeg'
                        , 'x-amz-acl': 'public-read'
                    };

                    s3.putBuffer(buf, '/img/d/' + filename + '.jpg', headers, function(err, res) {
                        if(err) {
                            return callback(err);
                        } else {
                            return callback(null, res.client._httpMessage.url);
                        }
                    });
                });
            } else {
                callback(err);
            }
        });
    }).on('error', function(err) {
        callback(err);
    });
};

Overall I don't see anything that would break in your code.

Two suggestions:

The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.

var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
  var buf = Buffer.concat(bufs);
}

For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.

As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.

Update

Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.

With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.

var pause_stream = require('pause-stream');

// Your other code.

var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
  var contentLength = bufs.reduce(function(sum, buf){
    return sum + buf.length;
  }, 0);

  // Create a stream that will emit your chunks when resumed.
  var stream = pause_stream();
  stream.pause();
  while (bufs.length) stream.write(bufs.shift());
  stream.end();

  var headers = {
      'Content-Length': contentLength,
      // ...
  };

  s3.putStream(stream, ....);

Streams and Buffers in Node.js - Developer's Arena, Buffers in Streams​​ A buffer memory in Node by default works on String and Buffer . We can also make the buffer memory work on JavaScript objects. To do so, we need to set the property objectMode on the stream object to true . If we try to push some data into the stream, the data is pushed into the stream buffer. I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correc

You can easily do this using node-fetch if you are pulling from http(s) URIs.

From the readme:

fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
    .then(res => res.buffer())
    .then(buffer => console.log)

How to read entire stream into buffer · Issue #403 · nodejs/readable , What is the recommended/idiomatic way for reading all data from a stream and store it in a buffer? Preferably using async/await style. Example:  Node.js Reading a file into a Buffer using streams Example While reading content from a file is already asynchronous using the fs.readFile() method, sometimes we want to get the data in a Stream versus in a simple callback.

I suggest loganfsmyths method, using an array to hold the data.

var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
  var buf = Buffer.concat(bufs);
}

IN my current working example, i am working with GRIDfs and npm's Jimp.

   var bucket = new GridFSBucket(getDBReference(), { bucketName: 'images' } );
    var dwnldStream = bucket.openDownloadStream(info[0]._id);// original size
  dwnldStream.on('data', function(chunk) {
       data.push(chunk);
    });
  dwnldStream.on('end', function() {
    var buff =Buffer.concat(data);
    console.log("buffer: ", buff);
       jimp.read(buff)
.then(image => {
         console.log("read the image!");
         IMAGE_SIZES.forEach( (size)=>{
         resize(image,size);
         });
});

I did some other research

with a string method but that did not work, per haps because i was reading from an image file, but the array method did work.

const DISCLAIMER = "DONT DO THIS";
var data = "";
stdout.on('data', function(d){ 
           bufs+=d; 
         });
stdout.on('end', function(){
          var buf = Buffer.from(bufs);
          //// do work with the buffer here

          });

When i did the string method i got this error from npm jimp

buffer:  <Buffer 00 00 00 00 00>
{ Error: Could not find MIME for Buffer <null>

basically i think the type coersion from binary to string didnt work so well.

Node.js, const fs = require('fs'); // Store file data chunks in this array let chunks = []; // We can use this variable to store the final data let fileBuffer; // Read file into stream. Read Streams in Node.js. A stream that is used to read the streaming data is called a read stream.A read stream can be reading a file from a server, or streaming an online video.

I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers

Understanding Streams in Node.js, are a way to handle reading/writing files, network communications, or any kind of end-to-end information exchange in an efficient way. I am trying to write a function in Lambda that requires a file from S3 to be read into a buffer. I have seen multiple examples of them being read into streams but none with buffers.

I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,

var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');

var _streamFile = function(res , stream , cb){
    var cache = new StreamCache();

    var lstream = lengthStream(function(length) {
        res.header("Content-Length", length);
        cache.pipe(res);
    });

    stream.on('error', function(err){
        return cb(err);
    });

    stream.on('end', function(){
        return cb(null , true);
    });

    return stream.pipe(lstream).pipe(cache);
}

Node.js - Streams, _read() method that is used to fill the read buffer). Data is buffered in Writable streams when the writable.write(chunk) method is called repeatedly. While the total  import { Readable } from 'stream' const buffer = new Buffer(img_string, 'base64') const readable = new Readable() readable._read = () => {} // _read is required but you can noop it readable.push(buffer) readable.push(null) readable.pipe(consumer) // consume the stream

Node Buffers, In Node.js, we find a similar mechanism called Stream . Data will begin to build up on the read-side of the data buffer as // `write` tries to keep up with the  If the internal read buffer is below the highWaterMark, and the stream is not currently reading, then calling stream.read(0) will trigger a low-level stream._read() call. While most applications will almost never need to do this, there are situations within Node.js where this is done, particularly in the Readable stream class internals.

Stream, The first operation is converting a read stream to a buffer. The most efficient operations with Streams are piping them to another stream. This is  Definition and Usage. The buffers module provides a way of handling streams of binary data. The Buffer object is a global object in Node.js, and it is not necessary to import it using the require keyword.

Backpressuring in Streams, By default, streams operate only on strings and buffers, which happens to be the same form of data that we used to write and read files in the  Streams are used in Node.js to read and write data from Input-Output devices. Node.js makes use of the 'fs' library to create readable and writable streams to files. These streams can be used to read and write data from files. Pipes can be used to connect multiple streams together.

Comments
  • it supports streams, but I need to know the Content-Length for the S3 headers and its impossible with streams
  • btw - what about the second part of the question ?
  • is it a better practice to pipe the stream from 'gm' to a file and then open a stream from that file and upload to S3, using the file size as Content-Length ? as far as I understand this eliminates loading the entire file to memory like I'm doing now
  • just want to mention that the bufs.pop() call should be bufs.unshift(), or even easier just replace the entire while loop with a simple for loop.
  • @Bergur True, but then you have to maintain two separate accumulator variables. I prefer maintaining the single one and calculating the length later. I'm not convinced it would make an appreciable difference in performance or anything.
  • You can also abuse Response from node-fetch to get a buffer from any stream not just http: new Response(stream).buffer().
  • Response.buffer is not a function. So um... what? Edit: Response.arrayBuffer seems to work