As you start to build out integrations with different APIs, the idea of being a “good API consumer” will become more important.
APIs, like any resource, are not limitless. Behind them are servers that need protection from being overwhelmed, networks that need protection from being flooded, and potentially millions of other users that also need to access the data.
You need to access any API you integrate with responsibly. 'Rate limiting' is the key principle that helps nudge you towards this responsibility. Rate limits are controls imposed on the frequency of requests that a client can make to an API. These limits are put in place to prevent the abuses above and make sure the API can serve everyone smoothly.
In this guide we want to take you through how you can be a good API consumer, adhere to rate limits, and get the data you need from your integrations.
Rate limits matter for both the API provider and you. The consequences of exceeding rate limits can be severe:
So, in the best-case scenario, your response will slow. In the worst-case scenarios, you’ll either lose access or find yourself with a massive bill.
To avoid these consequences, it's important to understand and respect the rate limits set by the API provider. For the most part, these are laid out in the docs, in your account, or in the headers of each response.
The page also gives the headers they send giving information about the users current limits, and the error messages a user will see when they pass their limit:
Before we get into some of the techniques you can use to deal with rate limits, we need to know how to recognize them.
The standard HTTP status code for indicating that the user has sent too many requests in a given amount of time is 429 Too Many Requests.
Thus, this is the sad path that you have to initially test for whenever making an API request to an integration:
What you do next will depend on whether the API gives any additional information. If an API responds with 429, it may also include a Retry-After header indicating how long (in seconds) the client should wait before making a new request. You can then use this value to wait for the API to resume responding:
This function first checks whether the server's response includes a "Retry-After" header. If it does, it waits the specified number of seconds before trying the request again. If it doesn't find a "Retry-After" header or if the status code is something other than 429, it throws an error.
Not all APIs include a "Retry-After" header in their responses, so always check the API's docs to understand how it implements rate limiting.
In addition to the 429 status code, some APIs will also respond with the exact details of their rate-limiting in the headers:
You can use this information to adjust how you are calling the API:
Here we’re adjusting our setTimeout in accordance with these headers and waiting for the API.
What if you don’t have any more details about the rate-limiting and are just getting 429s all the way down?
Then you need to retry. With basic retries, if the server responds with 429 you’re going to just retry a MAX_RETRIES number of times before throwing an error:
Here, we use the built-in setTimeout function to create a delay. If the status is 429, we increment our retry counter, wait for a certain amount of time, and then retry the request. If we exceed the maximum number of retries or get a different kind of error, we throw an error. Here’s what you’d see in the console:
This is fine, but naive. If you were making Too Many Requests a second ago, you probably still are now. Constantly battering an API like this is bad form, and exactly the behavior that will lead to throttling or suspension.
A better option is a technique known as exponential backoff. Exponential backoff is a strategy used to space out repeated reattempts in the face of API failures. The principle is to progressively lengthen the wait time between retries, multiplying the delay by a set factor each time, to reduce the load on the system and increase the chances of subsequent attempts succeeding.
Here’s how you can implement exponential backoff:
On every retry we are increasing our delay by a power of two, so the delay gets progressively longer each time:
With this technique we aren’t hammering the server every second. Instead, we’re calling it less and less, until we’re below our rate limit.
Exponential backoff is a good approach to gracefully dealing with rate limits. But the implementation above has a huge flaw–the thundering herd problem.
Imagine you’re an API that has some downtime briefly–just a few seconds. But during those few seconds, all your consumers started getting errors. Being good API consumers they had exponential backoff implemented. The problem is that as all their errors started at the same time, all their backoff delays become synced. Every consumer will try again 2, 4, 8, and so on seconds after that initial error. These synchronized retries from multiple clients will overload the system stopping it from coming back online. Hence, a “thundering herd.”
The answer here is to combine exponential backoff with a random "jitter" factor to avoid syncing across clients:
In this code, randomDelay is a random value between delay/2 and delay. The random factor helps spread out the retry attempts from different clients:
Some version of the code above implementing exponential backoff with jitter should be boilerplate for any API you are integrating with.
As soon as you call one API, you’ll want to call two. Modern programming is API-based so you are often fetching data from multiple sources. If you are building out integrations for your software, you might have to integrate with dozens of endpoints.
For instance, if you need to integrate with a wide range of go-to-market tools, you’ll have a HubSpot integration, an Outreach integration, a Pipedrive integration, a Zoho integration, and so on. A fundamental aspect of programming is DRY–don’t repeat yourself. You don’t want to have to write the above retry logic for each of these integrations. You want to write one that can handle everything.
This is challenging (it’s the challenge we set up Vessel to solve). Though it would be nice if every API returned the same errors and headers, they don’t. For instance:
How do we deal with all of these together? To deal with varying rate limit headers across different APIs, you can define a set of known rate limit header patterns and then search for these patterns in the response headers. Here's how you could structure this:
The getRateLimitInfo function looks for known rate limit header patterns in the response headers and, if it finds a match, uses the corresponding function to extract the rate limit information.
This approach provides a flexible way to handle varying rate limit headers. You can easily add more patterns to the rateLimitParsers dictionary as you encounter new ones.
If you are building integrations with an API it’s because that API and the data it returns is important to you.. Use it responsibly and efficiently. Here are some key principles:
This does require more work on your end. Above we’ve gone from basically a single line fetch to dozens of lines with error handling, backoff algorithms, and other logic. But this is also better for you. You’ll get better, more efficient, more consistent responses from the API making it easier to build the rest of your application, and ultimately making your application better for your users.