Product Development

Top 21 Node.js production best practices

Nisha Gopinath Menon

Even the most accomplished IT/DevOps guy can’t save a system that is poorly written. So code with the end in mind, plan for production from day one. Meaning, ensuring you save no data locally on a specific web server, utilizing cache heavily, gauging memory usage and leaks, minimizing the usage of anonymous functions, making use of CI tools to detect failures before sending to production, logging wisely and having an error handling strategy in place from the very start. Below are a set of practices we’ve developed along the way that have served us well.

Monitoring
At the fundamental level, monitoring means you can easily identify when bad things happen at production. When nailing down the requirements, start with defining the core set of metrics that must be watched to ensure a healthy state, server RAM, CPU, Node process RAM (less than 1.4GB), number of process restarts, the number of errors in the last minute, average response time, etc. Then go over some advanced features you might fancy and add them to your wish list.
Delegate anything possible to a reverse proxy
Node is awfully bad at doing CPU intensive tasks like SSL termination, gzipping, etc. So employ a ‘real’ middleware service like nginx. Otherwise, you'll find your single-thread busy with networking tasks rather than dealing with your application core, and performance will degrade accordingly. It’s very tempting to cargo-cult Express and use its rich middleware offering for networking related tasks like serving gzip encoding, static files, SSL termination and throttling requests. Remind yourself that Node is not a web server. This will eventually turn out to be a performance kill due to its single-threaded model which will keep the CPU busy for long periods. As soon as any volume of traffic starts to hit your application, you’ll notice that things start to go wrong. Don’t cripple your application for convenience.
Make use of SSL
By encrypting the data transmission, SSL provides you with an extra layer of security. Now, most of the time, Node.js developers read the SSL key from the file in the Node Server itself, you should always do that using reverse proxy. Install SSL on reverse proxy and let the outside world communicate with your Node app via reverse proxy.
Make use of smart logging to increase transparency
Logs can quickly end up being merely a warehouse of debug statements or the enabler of a detailed dashboard that tells the story of your app. Plan your logging platform from day one, from how logs are collected, stored and analyzed to ensure the desired information can readily be extracted. Else, you could end up with a black box that is hard to wade through. Then you end up rewriting all logging statements to add additional information. This logging framework will have an impact on your performance as well. So, it's worth the effort.
Lock dependencies
NPM lets dependencies drift across environments remain default even when your code is identical across all environments. When you install packages at various environments, it tries to fetch packages’ latest patch version. With the use of NPM config files that direct each environment to save the exact and not the latest version of every package, you can overcome this issue. Use NPM shrinkwrap for finer grain control, that states exactly which packages and versions should be installed, so no environment is tempted to fetch newer versions. The dependencies are locked by default as of NPM5. Else, QA will thoroughly test the code and approve a version that will behave differently in production. Worse still, various servers at the same production cluster might run different code.
Ensure error management best practices are met
The most painful and time-consuming task when it comes to keeping Node.JS environments stable has to be error management. The lack of a solid strategy in asynchronous flows for error paths and the ‘one thread’ model is to blame for this. You will have to truly understand and tame the error management beast here. There's no way around this. Else, you'll find errors disappearing without a trace, processes crashing merely because a user passed-in an invalid JSON, and stack-trace information revealed to the end-user.
Guard process uptime using the right tool
At the base level, Node processes must be guarded and restarted upon failures. Simply put, for small apps and those that don’t use containers, tools like PM2 are perfect as they provide simplicity, restarting capabilities and rich integration with Node. Others with strong Linux skills could use the system and run Node as a service. If you use Docker or any container technology, things are bound to get more interesting since it is usually accompanied with cluster management tools that deploy monitor and heal containers. Even though you have access to rich cluster management features including container restart now, there are good reasons to keep PM2 within containers as the first guarding tier. It’s much faster to restart a process and provide Node-specific features like flagging to the code when the hosting container asks to restart gracefully. Others might choose to avoid unnecessary layers. See what fits you best but understand the options you have. Running a whole lot of instances without a solid strategy and too many tools together (docker, cluster management, PM2) could lead to a DevOps chaos.
Utilize all CPU cores
At its basic form, a Node app runs over a single CPU core while all others remain idle. You have to replicate the Node process and utilize all CPUs. For small-medium apps, you could still use PM2 or Node Cluster. For a bigger app consider using some Docker cluster or deployment scripts that are based on the Linux init system to replicate the process. Else your application is likely to utilize barely thirty percent of its available resources or even less. A standard server has four or more CPU cores, native deployment of Node.JS uses only one, even with AWS beanstalk.
Discover errors and downtime using APM products
Monitoring and performance products proactively gauge codebase and API. And so they can go beyond traditional monitoring and measure the overall user-experience across services and tiers. For instance, some APM products can highlight a transaction that loads too slow on the end-users side while highlighting the root cause. We spend much time measuring API performance and downtimes, even so, it's often hard to figure out which the slowest code parts are in a real-world scenario and how they affect the UX.
Create a 'maintenance endpoint.’
A maintenance endpoint is a secured HTTP API that is part of the app code, with it the production team can view and invoke multiple useful functionalities. This endpoint comes into use when the conventional DevOps tools fail to gather a specific type of information or when you choose not to buy/install such tools. The standing rule is to use professional, external tools for maintaining and monitoring the production since they are more accurate and robust. That said, certain operations are easier to do using code. If not you’ll find yourself performing many “diagnostic deploys” shipping code to production only to extract some information for diagnostic purposes.
Tick the obvious security boxes
Node embodies some unique security challenges. A “Secured” system demands a far more extensive security analysis. The basics include creating a private network (VPC, VPN) to SSH within your systems, guarding business transaction with SSL/TLS, avoiding SQL injection attacks by using stored procedures or parameterized queries, sending careful HTTP headers and using cookies securely. Hardcore security checks need a professional’s expertise.
Get your frontend assets out of Node
Use dedicated middleware to serve frontend content as, due to its single threaded model, Node performance gets affected while dealing with many static files. Otherwise, your single Node thread will remain busy streaming hundreds of html/images/react/angular files instead of allocating all its resources serving dynamic content.
Kill your servers almost every day, be stateless.
Store any type of data within external data stores. Use a ‘serverless’ platform that explicitly enforces a stateless behavior or 'kill' your servers periodically. Failure at a single server leads to application downtime rather than killing a single faulty machine. This should prove to be enough of an incentive to practice this religiously. Moreover, scaling-out elasticity will get more challenging due to reliance on a specific server.
Measure and guard the memory usage
Node.js has a complicated relationship with memory. In smaller applications, you can periodically gauge memory with the help of shell commands but in larger applications, consider entrusting this to a robust monitoring system. Else you could wake up one morning to find your process memory has leaked a hundred megabytes.
Assign ‘Transaction Id’ to each log statement
Looking at a production error log without any context makes it far harder to zero in on the issue. So assign the same identifier, transaction-id: {some value}, within a single request to every log entry. Then the context becomes clear when inspecting errors in logs. Unfortunately, this won't be easy to achieve in Node due to its async nature.
Use tools that automatically detect vulnerabilities
Even the most reputable dependencies have known vulnerabilities from time to time that put a system at risk. So employ community and commercial tools that continuously check for vulnerabilities and give you a heads up, some can even patch them immediately. Keeping your code clean from vulnerabilities without dedicated tools means always having to follow online publications about new threats, which can get quite tedious.
Design automated, atomic and zero-downtime deployments
Teams who perform many deployments stand a lower chance of facing severe production issues. Automated and fast deployments that don’t require service downtime and risky manual steps improves the deployment process significantly. You can achieve this using Docker combined with CI tools. For streamlined deployment, they have turned into the industry standard.
Set the environment variable to production
Set the environment variable to ‘development’ or ‘production’ to flag whenever production optimizations have to be activated. Many NPM packages optimize their code for production based on the current environment. Omitting this simple property might considerably degrade performance. For example, omitting NODE_ENV makes it slower by a factor of three when using Express for server-side rendering.
Use a LTS release of Nodejs
Make sure you're using an LTS version of Node.js to receive critical bug fixes, security updates and performance improvements. Else newly discovered bugs or vulnerabilities could be used to exploit an application running in production, and your application may become unsupported by various modules and harder to maintain.
Bump your NPM version in each deployment
Increase the package.json version whenever a new version is released, so that its clear in production which version is being deployed. In MicroService environments where different servers might hold different versions, this becomes far more important. The command “npm version” can do this for you automatically. Often developers try to hunt for a production bug within a distributed system only to realize that the presumed version is not deployed where they were looking at.
Check your monitoring against real chaos
Unpredictable things happen in production. To name a few, servers get killed, the event loop tends to get blocked, SSL certificate validity may get revoked, DNS records may change and more. These sound like rare cases, but they do happen and picking up the pieces after is a mammoth task. The only way to truly mitigate this risk is to simulate these chaotic conditions and at least report their occurrence and verify whether the application can survive. A popular tool for chaos generation is the Netflix chaos-monkey. Don't be complacent about this, Murphy’s law could hit your production too.