How I built an API microservice that could handle up to 500-1,000RPS with minimal latency

How I built an API microservice that could handle up to 500-1,000RPS with minimal latency


13 min read

If you're interested in interacting with my API through Swagger UI, you can find the live link to the deployed service here.
If you'd like to check out my code, feel free to check out my GitHub repo!
If you want to read an exhaustive breakdown of my daily processes, hurdles, and successes, check out my Notion Engineering Journal!


I decided to refactor and improve upon my System Design Capstone project from Hack Reactor back in 2020. Though I learned a lot from my initial attempt 3 years ago, there was a ton more that I wanted to do and learn-- such as implementing a caching strategy, using a load balancer to distribute traffic, refactoring my project from JavaScript to TypeScript, and using Swagger UI to have a documented front-end for my API. Since I'm currently on the job hunt, I figured why not dedicate a 2-week sprint to try to see how much I could do and learn with this project.


The goal of this project was to work with an inherited legacy front-end e-commerce web portal, Pulsar Shop, and build out a specific API microservice to be used by the front-end client. The microservice was to be a RESTful API that would need to handle web-scale traffic on all 4 read routes.

  • For this project, web-scale traffic was defined as:

    • The ability to handle a minimum of 100 requests/second (RPS)

    • <2000ms of latency

    • <1% error rate

After various optimizations, this API microservice allows for real-world traffic loads of up to 500 RPS in 2 of 4 read routes, and up to 1,000 RPS on the rest, with an error rate of 0%.

Before I started coding or designing any part of my system, I thought it would be wise to start with a high-level list of objectives I wanted to achieve with this project. I figured that having this list would naturally serve as milestones toward the completion of the project and help me feel I was making incremental progress.


  • Transform the existing application data and load it into the database

  • Containerize application for later deployment in AWS

  • Design and build an API server to provide data to the client in the format specified by the API documentation

    • Define API routes and 5 architecture layers: app โ†’ routes โ†’ controllers โ†’ model โ†’ persistence

    • Figure out and optimize SQL implementation details for queries

  • Optimize my service by analyzing query times and server responses

    • All read queries should run in under 50ms, ideally 10ms

    • Deploy and scale this service to support (a minimum of) 100 requests per second on EC2 using a t2.micro

    • Stretch goal of 200 RPS and up (might need to add second t2.micro instance)

    • Super stretch goal of 500/1000 RPS and up (might need to add second t2.micro instance)

  • Deploy service and integrate it successfully with the FEC web application

  • Measure and improve the performance of service at scale

System Architecture


Tech Stack

I opted for 2 deployment methods for testing purposes:

The first method utilized Docker-Compose to spin up containers that consisted of one container each for:

  • Node Server

  • Postgres DB

  • Redis DB

  • Nginx Load-balancer

    This allowed for a fast and consistent developer experience for both local testing and production testing. Additionally, this allowed my entire service to be easily deployed to one AWS EC2 t2.micro instance.

The second method consisted of spinning up 4 EC t2.micro instances, one each for:

  • Node server 1

  • Node server 2

  • 1 Nginx Load Balancing server

  • 1 PostgreSQL DB w/Redis server

    This allowed for a more highly available and higher load with-standing system since I wasn't only load testing one EC2 instance, but distributing it among 4. Ultimately this became my official system architecture.

Routes built

MethodEndpointPurposeResponse Code
GET/products/listRetrieves the list of products.200
GET/products/:product_idReturns all product- level information for a specified product by ID.200
GET/products/:product_id/stylesReturns all the styles available for the given product by ID.200
GET/products/:product_id/relatedReturns the IDs of all the products related to the product specified by ID.200
PUT/products/:product_idUpdates a Product by ID.200
POST/productsCreates a new Product201
DELETE/products/:product_idDeletes a Product by ID200

Development Process

Schema Design

Since I had 6 CSV files (products, features, styles, related, photos, and skus) that made up all of my product information to work with, and I knew the data shape the front-end was expecting, the first order of business was to create a data model that would conform to the data shape expected by the front end.

Database Choice

Given that the data naturally lent itself to a relationship-driven schema, I knew choosing a Relational DB would be a good choice. I ultimately chose PostgreSQL for its powerful aggregator functions, as well as the ability to build out json objects as queries.

ETL Process

An ETL process was required before beginning the project since all of the data had to be sanitized and normalized, and represented ~5GBs worth of data or 49M+ records. Opening each CSV to work on proved impossible, given that each CSV had anywhere from 1 million to 26 million rows and easily overwhelmed the RAM on my machine. Therefore, I was forced to do in-depth research on Node Streams to:

  • Create a readable Stream from each CSV, since it was impossible to open all of them in memory

  • Pipe in a CSV parser with specific configurations to open and read the CSV

  • Pipe in a custom transformer function-- extending the Transform class in Node to normalize all rows and columns

  • Pipe to the writable Stream with my clean CSV files

Once the "Extraction" and "Transformation" parts of the process were done, I needed to build out an automated way to "Load" my ~5GBs worth of data into Docker so that it would be accessible within the running instance of Postgres to then copy it into the database. Through a lot of trial and error, I found that I could utilize Docker's /docker-entrypoint-initdb.d entrypoint and updated my Dockerfile to load in 3 scripts:

  • One to init my database schema

  • Another to copy the CSV files into their appropriate tables

  • And the last one to create the indexes, but that was included later once I had figured out which indexes to create when I began my optimization

With my ETL process finally complete, I was able to run my service locally using Docker-Compose!

API Design

As I began to think about how to design my API: thinking about the routes, how to respond to those routes, and how to interact with my database, I found that I wanted to follow a layers architecture approach in my codebase. This would allow me to easily divide concerns and responsibilities into different files and directories, which would allow direction communication only between certain files.


  • The Application layer would handle the basic setup of my server and the connection to my routes, as well as any middlewares and swagger specs later on.

  • The Routes layer defined all of my routes and served as the connection to the Controllers layer, as well as where I handled input validation using express-validator.

  • The Controllers layer is where all of the service's business logic would live for each of my endpoints, decide what HTTP codes to return to the client and also serve as the connection to the Model layer.

  • The Model layer housed all of my logic for interacting with my Postgres database (and Redis cache later on).

  • The Persistence layer is where my database lived, which is located in the /database directory, outside of the /server directory, which instantiates and exports the Pg.Pool interface as well as the Redis.Client interface.

Performance Tuning + Optimizations Part 1 (Locally)

Initial Benchmark

An initial test using Postman on the most computationally heavy query showed a response time of >1 minute! With this benchmark in mind, an initial goal was to optimize my SQL query itself down to under 50ms and a stretch goal of 10ms.

Screenshot 2023-06-07 at 5 00 28 PM

Client vs Pool

I had to initially use the Pg.Client class from node-pg since it was simple to set up and I began making one-off manual tests using Postman, however, according to the docs, by utilizing a Client vs a Pool I was in danger of quickly exhausting available, idle clients. This could have the negative effect of causing my server to timeout with an error or hang indefinitely. Additionally, when using Pg.Pool, you must release a client back to the pool after each request, thereby ensuring an available client at scale unless the machine itself is overwhelmed.

This change from using Pg.Client to Pg.Pool led to an increase of 375% in performance

Screenshot 2023-06-07 at 5 33 49 PM


Once I had learned to properly query my database using connection pooling, I moved on to the biggest optimization of my queries: indexing. However, I knew I couldn't just index everything, since the overhead cost of creating those indexing would negate any performance I was hoping to achieve. With that mindset, I started to think about how to identify what to index, and Postgres' EXPLAIN ANALYZE function came to my rescue!

I wanted to understand what the planning and execution times of my queries were for each route:

GET /products/list:


GET /products/:product_id:


GET /products/:product_id/styles:


GET /products/:product_id/related:


After analyzing each of my queries, I realized that there were Sequence scans happening on all queries that weren't looking a product up by ID, that were severely limiting my performance. I identified that this was happening in 5 specific relationships:

  • features and product_id

  • styles and product_id

  • photos and styles_id

  • skus and styles_id

  • related and product_id

Once I had created these 5 indexes, I saw the following performance increases:

  • 495% decrease in query time on GET /products/list:

  • 7,565% decrease in query time on GET /products/:product_id:

  • 1,014,070% decrease in query time on GET /products/:product_id/styles:

  • 47,757% decrease in query time on GET /products/:product_id/related:

All my queries were WELL BELOW the stated goal of 50ms and even faster than the stretch goal of 10ms. Now it was time to stress test locally before I deploy and start testing on an EC2 instance!


Using Artillery to stress test my service, I was able to successfully go up to 100 RPS, but eventually crashed out at 1000 RPS as can be seen below:

100 RPS:
Screenshot 2023-06-07 at 6 45 13 PM
1,000 RPS:
Screenshot 2023-06-07 at 6 46 11 PM

Performance Tuning + Optimizations Part 2 (Deployment) w/


For my caching strategy, I decided to use the "lazy-loading" cache-aside strategy by placing my Redis cache on the same server as my Postgres DB:

Screenshot 2023-06-07 at 6 46 11 PM

Load Balancer

Once I had seen improved performance using my caching strategy, I decided to implement a load balancer technique as the higher RPS I pushed, the higher my latency was. Therefore I reasoned that perhaps the load on the single server was too much. I realized that Docker-Compose has a pretty nifty feature of scaling out any service I want by using the docker-compose --scale <service-name>=<# of services> syntax to scale up my containers within my one EC2 instance.

For my load-balancing strategy I utilized the default round-robin strategy:

Screenshot 2023-06-07 at 6 46 11 PM

One EC2 Instance running Dockerized service

GET Products List:

Redis Caching Only: Topped out at 750RPS + 223ms latency
Screenshot 2023-06-03 at 2 43 43 PM
Redis + Nginx: Topped out at 750RPS + 1809ms latency
Screenshot 2023-06-03 at 4 00 34 PM

GET Product by ID:

Redis Caching Only: Topped out at 200RPS + 14ms latency
Screenshot 2023-06-03 at 2 52 16 PM
Redis + Nginx: Topped out at 200RPS + 15ms latency
Screenshot 2023-06-03 at 4 09 59 PM

GET Product Styles:

Redis Caching Only: Topped out at 200RPS + 17ms latency
Screenshot 2023-06-03 at 3 00 09 PM
Redis + Nginx: Topped out at 100RPS + 18ms latency
Screenshot 2023-06-03 at 4 14 08 PM
Redis Caching Only: Topped out at 750RPS + 1686ms latency
Screenshot 2023-06-03 at 3 11 16 PM
Redis + Nginx: Topped out at 500PS + 1634ms latency
Screenshot 2023-06-03 at 4 21 18 PM

4 EC2 Instances (2 servers, 1 Postgres + Redis server, 1 Nginx load balancer)

GET Products List:

Redis Caching Only: Topped out at 1,000RPS + 758ms latency
Screenshot 2023-06-07 at 8 15 01 PM
Redis + Nginx: Topped out at 1,000RPS + 16ms latency
Screenshot 2023-06-06 at 1 48 26 PM

GET Product by ID:

Redis Caching Only: Topped out at 200RPS + 14ms latency
Screenshot 2023-06-07 at 8 15 32 PM
Redis + Nginx: Topped out at 500RPS + 15ms latency
Screenshot 2023-06-06 at 1 37 52 PM

GET Product Styles:

Redis Caching Only: Topped out at 200RPS + 16ms latency
Screenshot 2023-06-07 at 8 15 58 PM
Redis + Nginx: Topped out at 500RPS + 19ms latency
Screenshot 2023-06-06 at 2 03 31 PM
Redis Caching Only: Topped out at 1,000RPS + 1837ms latency
Screenshot 2023-06-07 at 8 13 52 PM
Redis + Nginx: Topped out at 1,000RPS + 30ms latency
Screenshot 2023-06-06 at 2 12 18 PM

Results Observed

After comparing the results from my Dockerized service on one EC2 instance vs the 4 distributed EC2 instances' performance, I came away with 3 primary conclusions:

  1. While I was easily able to handle at least 200RPS on my service implementing Redis and scaling up my Docker containers to utilize Nginx, I realized that my biggest bottleneck became my t2.micro's limited CPU. I started seeing worse results after implementing Nginx. While Docker made it super easy to deploy and even scale my service, it was too much load on the one machine.

  2. Therefore, I made the correct decision to horizontally scale my service through 4 EC2 t2.micro machines and this is where I saw the greatest output, easily increasing my performance throughput by 250% from 200RPS to 500RPS on my 2 most computationally costly read routes (GET Products by ID and GET Product Styles) and by 133% from 750RPS to 1,000RPS on my two other read routes (GET Products List and GET Related Product IDs).

  3. While my performance was boosted by horizontally scaling out with multiple EC2 machines, cost was a factor I hadn't paid much attention to until I was hit with a $27 bill for operating those machines in the month since I've published my project. While this isn't breaking the bank, at scale, this would potentially be cost-prohibitive and I can see why a more practical approach might be to go back to the containerized version of this app on one machine. While I do lose performance, the cost-benefit analysis dictates that the tradeoff is worthwhile given my project scale. Additionally, I might also end up changing the cloud provider hosting this project to reduce my cost to 0 and see if I make a follow-up post detailing the process.

Future Improvements

If I had more time, I would have been interested in testing on a larger machine (t2small, medium, or even large) with more CPUs so that I could look into utilizing Node Clusters to take advantage of running my service on multiple threads.


Cesar Roman - LinkedIn -