How I built an API microservice that could handle up to 500-1,000RPS with minimal latency
Summary
I decided to refactor and improve upon my System Design Capstone project from Hack Reactor back in 2020. Though I learned a lot from my initial attempt 3 years ago, there was a ton more that I wanted to do and learn-- such as implementing a caching strategy, using a load balancer to distribute traffic, refactoring my project from JavaScript to TypeScript, and using Swagger UI to have a documented front-end for my API. Since I'm currently on the job hunt, I figured why not dedicate a 2-week sprint to try to see how much I could do and learn with this project.
Goal
The goal of this project was to work with an inherited legacy front-end e-commerce web portal, Pulsar Shop, and build out a specific API microservice to be used by the front-end client. The microservice was to be a RESTful API that would need to handle web-scale traffic on all 4 read routes.
For this project, web-scale traffic was defined as:
The ability to handle a minimum of 100 requests/second (RPS)
<2000ms of latency
<1% error rate
After various optimizations, this API microservice allows for real-world traffic loads of up to 500 RPS in 2 of 4 read routes, and up to 1,000 RPS on the rest, with an error rate of 0%.
Before I started coding or designing any part of my system, I thought it would be wise to start with a high-level list of objectives I wanted to achieve with this project. I figured that having this list would naturally serve as milestones toward the completion of the project and help me feel I was making incremental progress.
Objectives
Transform the existing application data and load it into the database
Containerize application for later deployment in AWS
Design and build an API server to provide data to the client in the format specified by the API documentation
Define API routes and 5 architecture layers: app โ routes โ controllers โ model โ persistence
Figure out and optimize SQL implementation details for queries
Optimize my service by analyzing query times and server responses
All read queries should run in under 50ms, ideally 10ms
Deploy and scale this service to support (a minimum of) 100 requests per second on EC2 using a t2.micro
Stretch goal of 200 RPS and up (might need to add second t2.micro instance)
Super stretch goal of 500/1000 RPS and up (might need to add second t2.micro instance)
Deploy service and integrate it successfully with the FEC web application
Measure and improve the performance of service at scale
System Architecture
Overview
Tech Stack
I opted for 2 deployment methods for testing purposes:
The first method utilized Docker-Compose to spin up containers that consisted of one container each for:
Node Server
Postgres DB
Redis DB
Nginx Load-balancer
This allowed for a fast and consistent developer experience for both local testing and production testing. Additionally, this allowed my entire service to be easily deployed to one AWS EC2 t2.micro instance.
The second method consisted of spinning up 4 EC t2.micro instances, one each for:
Node server 1
Node server 2
1 Nginx Load Balancing server
1 PostgreSQL DB w/Redis server
This allowed for a more highly available and higher load with-standing system since I wasn't only load testing one EC2 instance, but distributing it among 4. Ultimately this became my official system architecture.
Routes built
Method | Endpoint | Purpose | Response Code |
GET | /products/list | Retrieves the list of products. | 200 |
GET | /products/:product_id | Returns all product- level information for a specified product by ID. | 200 |
GET | /products/:product_id/styles | Returns all the styles available for the given product by ID. | 200 |
GET | /products/:product_id/related | Returns the IDs of all the products related to the product specified by ID. | 200 |
PUT | /products/:product_id | Updates a Product by ID. | 200 |
POST | /products | Creates a new Product | 201 |
DELETE | /products/:product_id | Deletes a Product by ID | 200 |
Development Process
Schema Design
Since I had 6 CSV files (products
, features
, styles
, related
, photos
, and skus
) that made up all of my product information to work with, and I knew the data shape the front-end was expecting, the first order of business was to create a data model that would conform to the data shape expected by the front end.
Database Choice
Given that the data naturally lent itself to a relationship-driven schema, I knew choosing a Relational DB would be a good choice. I ultimately chose PostgreSQL for its powerful aggregator functions, as well as the ability to build out json objects as queries.
ETL Process
An ETL process was required before beginning the project since all of the data had to be sanitized and normalized, and represented ~5GBs worth of data or 49M+ records. Opening each CSV to work on proved impossible, given that each CSV had anywhere from 1 million to 26 million rows and easily overwhelmed the RAM on my machine. Therefore, I was forced to do in-depth research on Node Streams to:
Create a readable Stream from each CSV, since it was impossible to open all of them in memory
Pipe in a CSV parser with specific configurations to open and read the CSV
Pipe in a custom transformer function-- extending the Transform class in Node to normalize all rows and columns
Pipe to the writable Stream with my clean CSV files
Once the "Extraction" and "Transformation" parts of the process were done, I needed to build out an automated way to "Load" my ~5GBs worth of data into Docker so that it would be accessible within the running instance of Postgres to then copy it into the database. Through a lot of trial and error, I found that I could utilize Docker's /docker-entrypoint-initdb.d
entrypoint and updated my Dockerfile to load in 3 scripts:
One to init my database schema
Another to copy the CSV files into their appropriate tables
And the last one to create the indexes, but that was included later once I had figured out which indexes to create when I began my optimization
With my ETL process finally complete, I was able to run my service locally using Docker-Compose!
API Design
As I began to think about how to design my API: thinking about the routes, how to respond to those routes, and how to interact with my database, I found that I wanted to follow a layers architecture approach in my codebase. This would allow me to easily divide concerns and responsibilities into different files and directories, which would allow direction communication only between certain files.
The Application layer would handle the basic setup of my server and the connection to my routes, as well as any middlewares and swagger specs later on.
The Routes layer defined all of my routes and served as the connection to the Controllers layer, as well as where I handled input validation using
express-validator
.The Controllers layer is where all of the service's business logic would live for each of my endpoints, decide what HTTP codes to return to the client and also serve as the connection to the Model layer.
The Model layer housed all of my logic for interacting with my Postgres database (and Redis cache later on).
The Persistence layer is where my database lived, which is located in the
/database
directory, outside of the/server
directory, which instantiates and exports thePg.Pool
interface as well as theRedis.Client
interface.
Performance Tuning + Optimizations Part 1 (Locally)
Initial Benchmark
An initial test using Postman on the most computationally heavy query showed a response time of >1 minute! With this benchmark in mind, an initial goal was to optimize my SQL query itself down to under 50ms and a stretch goal of 10ms.
Client vs Pool
I had to initially use the Pg.Client
class from node-pg
since it was simple to set up and I began making one-off manual tests using Postman, however, according to the docs, by utilizing a Client vs a Pool I was in danger of quickly exhausting available, idle clients. This could have the negative effect of causing my server to timeout with an error or hang indefinitely. Additionally, when using Pg.Pool
, you must release a client back to the pool after each request, thereby ensuring an available client at scale unless the machine itself is overwhelmed.
This change from using Pg.Client
to Pg.Pool
led to an increase of 375% in performance
Indexing
Once I had learned to properly query my database using connection pooling, I moved on to the biggest optimization of my queries: indexing. However, I knew I couldn't just index everything, since the overhead cost of creating those indexing would negate any performance I was hoping to achieve. With that mindset, I started to think about how to identify what to index, and Postgres' EXPLAIN ANALYZE
function came to my rescue!
I wanted to understand what the planning and execution times of my queries were for each route:
GET /products/list
:
GET /products/:product_id
:
GET /products/:product_id/styles
:
GET /products/:product_id/related
:
After analyzing each of my queries, I realized that there were Sequence scans happening on all queries that weren't looking a product up by ID, that were severely limiting my performance. I identified that this was happening in 5 specific relationships:
features
andproduct_id
styles
andproduct_id
photos
andstyles_id
skus
andstyles_id
related
andproduct_id
Once I had created these 5 indexes, I saw the following performance increases:
495% decrease in query time on GET
/products/list
:7,565% decrease in query time on GET
/products/:product_id
:1,014,070% decrease in query time on GET
/products/:product_id/styles
:47,757% decrease in query time on GET
/products/:product_id/related
:
All my queries were WELL BELOW the stated goal of 50ms and even faster than the stretch goal of 10ms. Now it was time to stress test locally before I deploy and start testing on an EC2 instance!
Artillery
Using Artillery to stress test my service, I was able to successfully go up to 100 RPS, but eventually crashed out at 1000 RPS as can be seen below:
100 RPS:
1,000 RPS:
Performance Tuning + Optimizations Part 2 (Deployment) w/Loader.io
Cache
For my caching strategy, I decided to use the "lazy-loading" cache-aside strategy by placing my Redis cache on the same server as my Postgres DB:
Load Balancer
Once I had seen improved performance using my caching strategy, I decided to implement a load balancer technique as the higher RPS I pushed, the higher my latency was. Therefore I reasoned that perhaps the load on the single server was too much. I realized that Docker-Compose has a pretty nifty feature of scaling out any service I want by using the docker-compose --scale <service-name>=<# of services>
syntax to scale up my containers within my one EC2 instance.
For my load-balancing strategy I utilized the default round-robin strategy:
One EC2 Instance running Dockerized service
GET Products List:
Redis Caching Only: Topped out at 750RPS + 223ms latency
Redis + Nginx: Topped out at 750RPS + 1809ms latency
GET Product by ID:
Redis Caching Only: Topped out at 200RPS + 14ms latency
Redis + Nginx: Topped out at 200RPS + 15ms latency
GET Product Styles:
Redis Caching Only: Topped out at 200RPS + 17ms latency
Redis + Nginx: Topped out at 100RPS + 18ms latency
GET Related Product IDs:
Redis Caching Only: Topped out at 750RPS + 1686ms latency
Redis + Nginx: Topped out at 500PS + 1634ms latency
4 EC2 Instances (2 servers, 1 Postgres + Redis server, 1 Nginx load balancer)
GET Products List:
Redis Caching Only: Topped out at 1,000RPS + 758ms latency
Redis + Nginx: Topped out at 1,000RPS + 16ms latency
GET Product by ID:
Redis Caching Only: Topped out at 200RPS + 14ms latency
Redis + Nginx: Topped out at 500RPS + 15ms latency
GET Product Styles:
Redis Caching Only: Topped out at 200RPS + 16ms latency
Redis + Nginx: Topped out at 500RPS + 19ms latency
GET Related Product IDs:
Redis Caching Only: Topped out at 1,000RPS + 1837ms latency
Redis + Nginx: Topped out at 1,000RPS + 30ms latency
Results Observed
After comparing the results from my Dockerized service on one EC2 instance vs the 4 distributed EC2 instances' performance, I came away with 3 primary conclusions:
While I was easily able to handle at least 200RPS on my service implementing Redis and scaling up my Docker containers to utilize Nginx, I realized that my biggest bottleneck became my t2.micro's limited CPU. I started seeing worse results after implementing Nginx. While Docker made it super easy to deploy and even scale my service, it was too much load on the one machine.
Therefore, I made the correct decision to horizontally scale my service through 4 EC2 t2.micro machines and this is where I saw the greatest output, easily increasing my performance throughput by 250% from 200RPS to 500RPS on my 2 most computationally costly read routes (GET Products by ID and GET Product Styles) and by 133% from 750RPS to 1,000RPS on my two other read routes (GET Products List and GET Related Product IDs).
While my performance was boosted by horizontally scaling out with multiple EC2 machines, cost was a factor I hadn't paid much attention to until I was hit with a $27 bill for operating those machines in the month since I've published my project. While this isn't breaking the bank, at scale, this would potentially be cost-prohibitive and I can see why a more practical approach might be to go back to the containerized version of this app on one machine. While I do lose performance, the cost-benefit analysis dictates that the tradeoff is worthwhile given my project scale. Additionally, I might also end up changing the cloud provider hosting this project to reduce my cost to 0 and see if I make a follow-up post detailing the process.
Future Improvements
If I had more time, I would have been interested in testing on a larger machine (t2small, medium, or even large) with more CPUs so that I could look into utilizing Node Clusters to take advantage of running my service on multiple threads.
Contact
Cesar Roman - LinkedIn - cesardroman18@gmail.com