A Guide to Node.js streams
Streams make a huge difference in code when it comes to memory consumption. Say you create a big file and if you write to that big.file through a writable stream a million lines with a loop, running this script will generate a file that’s about 400 MB in size. The server will serve the big file using the asynchronous method when the node server gets a request. And everything feels great till now since it’s not like we’ve blocked the event loop or anything. But when you run the server, connect to it, and monitor the memory, you’ll see that it starts out with a normal amount of memory, about 9 MB. Then you connect to the server, and the memory consumption jumps to almost 440 MB. Here, you have basically put the whole big.file content in memory before writing it out to the response object. This is very inefficient. If you have a readable stream that represents the content of big.file, you could achieve mostly the same result without consuming about 400 MB of memory, by just piping those two on each other. Then when you connect to the server and a client asks for that big file, you don’t buffer it in memory at all since you will be streaming it one chunk at a time. The memory usage grows by about 25 MB, and that’s it.
You see, streams aren’t merely useful when transferring data between various output destinations and input sources. Developers can also transform the data that comes from the stream, once the stream is opened, and data exposed before it reaches its destination. This is one of the biggest advantages that come with using streams. You can slot different programs in between, once a stream is opened and you can read the data bit by bit.
The event-driven and asynchronous nature of Node.js makes it very good at handling I/O bound tasks. Meaning, you can now make use of the streams available in Node.js when working on any app that performs I/O operations. Now that we’ve talked about some of the benefits of Node.js streams, it’ll make it easier to understand why it’s important to understand how they can simplify I/O.
Let’s start with the basics.
You may have come across streams if you have already worked with Node.js. Just like strings or arrays, they are collections of data . The difference is that they don’t have to fit in memory and streams might not be available all at once. While working with data that’s come from an external source a chunk at a time or just large amounts of data, this makes streams really powerful.
Data flows in chunks to the process consuming it, from its origin, once a stream is opened. Compared to loading data all at once, the biggest advantage is that the input can be endless and without limits in theory. Input streams are also called readable streams, as they’re meant to read data from a source. There are also outbound streams or destinations, which could be someplace in memory or files, but also output devices like a printer, the command line, or your screen. They’re also known as writeable streams. They store the data which comes across the stream. This data is a sequence of elements made available over time. Readable streams could have originated from various sources, such as files, data stored in memory or input devices (keyboards). Writeable streams could also end in various places, such as the command line, files or memory. Writeable and readable streams can be interchanged. File input can end up on the commandline, and keyboard input in a file. It is also possible to combine different readable and writable streams. You can print file input out to a connected printer or the command line, or you can just store key input directly into a file. No matter what the destinations or sources are, the interface stays the same. A duplex stream is both Writable and Readable. Take for instance a TCP socket. A duplex stream which could be used to transform or modify the data as it is written and read is a transform stream. Picture a transform stream as a function where the input is the writable stream part, and the output is readable stream part.
But streams are not merely about working with big data. As they also give us the power of composability in our code. Streams emit events which could be used to write and read data since all streams are instances of EventEmitter. However, using the pipe method, we can consume streams data in a simpler way.
The pipe method
Piping is a great mechanism to read data from the source and write to destination without managing the flow yourself. This way you won’t have to worry about fast or slow data flow. As you pipe the output of a readable stream , the source of data, as the input of a writable stream , the destination, the destination has to be a writable one and the source has to be a readable stream. Of course, they can both be transform/duplex streams as well.
The easiest way to consume streams is by using the pipe method. Generally, it’s recommended to either consume streams with events or use the pipe method. Usually, you don’t need to use events when you’re using the pipe method, but events would be the way to go if you need to consume the streams in more custom ways. However, you can also consume streams with events directly. Functions and events can be combined to make for an optimized and custom use of streams. Because they are usually used together, the events and functions are somehow related.
And besides writing to a writable destination after reading from a readable stream source, the pipe method manages a few things by itself along the way, such as handling errors and end-of-file. Also, the pipe method returns the destination stream. Meaning, you can easily use this to chain multiple streams together. Take, for instance, you have an archive and want to decompress it. There are a number of ways to achieve this. But the easiest and cleanest way is to use piping and chaining.
The best way to read data from a stream is to listen to data event and attach a callback. The readable stream emits a data event and your callback executes when a chunk of data is available. There are two main modes in readable streams that affect the way we can consume them. They can either be in the paused mode, or in the flowing mode. By default, all readable streams start in the paused mode, but they can easily be switched back to paused after you switch it to flowing when needed. The switching sometimes happens automatically.
To read from the stream on demand, you can use the read() method when a readable stream is in the paused mode. But we have to listen to events to consume it, for a readable stream in the flowing mode where the data is continuously flowing. Data could get lost if no consumers are available to handle it in the flowing mode. This is why we need a data event handler in flowing mode when we have a readable stream. In fact, you can switch a paused stream to a flowing one by merely adding a data event handler and remove it to switch the stream back to paused mode.
You can write data to a destination using writable streams. These are also EventEmitters, like readable streams, and emit various events at various points. You have to call write() on the stream instance to write data to a writable stream. The function then returns a Boolean value indicating whether the operation was successful or not. The write was successful if true, and you can keep writing more data. If false, you can’t write anything at the moment since something went wrong. By emitting a drain event, the writable stream will let you know when you can start writing more data.
Node.js streams hold a reputation of being hard to understand. Developers have created lots of packages, over the years, with the sole intention of making working with streams smoother. And today its far easier to find your way in these streams. Today, most are just not very well implemented, not so much misunderstood.
There are two main different tasks when talking about streams in Node.js. The task of consuming them and the task of implementing the streams. So far we’ve talked about consuming streams. Let’s look into the implementing bit.
IMPLEMENTING A WRITABLE STREAM
You need to make use of the Writable constructor from the stream module to implement a writable stream. A writable stream can be implemented in many ways. The Writable constructor can be extended if you want. However, the simpler constructor approach might serve you better. Here from the Writable constructor, you just create an object and pass it a number of options. The only option you require is a write function that exposes the chunk of data to be written.
IMPLEMENT A READABLE STREAM
To implement a readable stream, all you need is the Readable interface to construct an object from it then in the stream’s configuration parameter, implement a read() method. You can also directly push the data that you want the consumers to consume. But it isn’t a very efficient method. Here basically, before piping it to process.stdout you’re shoving all the data in the stream. Pushing data on demand when a consumer asks for it is the much better way. You can do that by implementing the read() method in the configuration object.
IMPLEMENTING DUPLEX/TRANSFORM STREAMS
With Duplex streams, with the same object, you can implement both writable and readable streams. It’s as if you're inheriting from both interfaces. It’s important to understand that the writable and readable sides operate entirely independently from one another of a duplex stream. It is just a grouping of two features into an object. We can use this duplex stream, by combining the methods, to read the letters from A to Z and for its echo feature.
The more interesting duplex stream is the transform one as you compute its output from its input. We don’t have to implement the write or read methods for a transform stream. You only have to implement a transform method that combines them both. We could use it to push data as well, and it has the signature of the write method.