Sept. 3, 2020, 1:49 p.m.
Data Streams in Node
By Maurice Ticas
How do you efficiently copy a large amount of image files to a Node server?
I've been working on a photo app for over a month now. I've named it fotoshare. I want the app to be highly concurrent while copying and renaming many image files. I'm starting with a directory of 1,385 media files with a total size of 12 GB. As with 2020 being a lost year, I've not taken many photos and have lost capturing many moments on photo this year. Nevertheless, my media directory has grown to its current size over the past three years and will only grow larger as time passes. I ve been thinking on how to best implement a way to import and manage my photo assets in Node.
There are many ways to read files in Node. There is the way of reading all at once into memory. There is the low-level way with read() and write() that both use file descriptors. And there is the way of streams. Streams allow us to take incremental chunks of data in memory, sequentially process these chunks if needed, and write out. We do not need to have the entire file in memory at the time we do the processing.
There are a few ways to write code that handle streams in Node. You can pipe a readable stream to a writable stream using the pipe() method. The piped approach takes care of memory-backed-up pressure within the internal writable stream buffer. David Flanagan's Javascript: The Definitive Guide defines the term backpressure in this context as a "message from the [writable] stream that you have written data more quickly than it can be handled". A benefit to using pipes with streams is that we don't have to handle this back pressure in our code. The pipe implementation takes care of the backpressure for us.
I started writing the code for fotoshare using Node version 10, but a change to Node version 12 or higher is needed if the async-for-await construct is desired to handle streams. If you need to change the version of Node, then use the Node Version Manager to switch Node versions.
This async-for-await approach allows you to write code that looks synchronous. It's a promises approach that uses readable streams as asyncrhonous iterators to asynchronously handle streams. The benefit to using async-for-await is easy readability of the code. Backpressure can be easily handled in your code using this approach to streams. This approach is more suitable when you need to process chunks of the readable stream data in some specific way.
As is the case with many Node APIs, using streams is heavily event-based. This is especially more evident when using any of the two modes for handling readable streams. A readable stream can be either in pause mode or flow mode whereby each mode has its own API for stream handling.
The use of streams enable efficiently high input-output throughput. Samer Buna gives an excellent concrete example of the efficiency achieved when using streams to write a server that serves large static files. You won't be disappointed to read his freeCodeCamp contribution titled Node.js Streams: Everything you need to know.
There are 0 comments. No more comments are allowed.