A Beginner’s Guide to Handling Integration Rate Limits

As you start to build out integrations with different APIs, the idea of being a “good API consumer” will become more important.

APIs, like any resource, are not limitless. Behind them are servers that need protection from being overwhelmed, networks that need protection from being flooded, and potentially millions of other users that also need to access the data.

You need to access any API you integrate with responsibly. 'Rate limiting' is the key principle that helps nudge you towards this responsibility. Rate limits are controls imposed on the frequency of requests that a client can make to an API. These limits are put in place to prevent the abuses above and make sure the API can serve everyone smoothly.

In this guide we want to take you through how you can be a good API consumer, adhere to rate limits, and get the data you need from your integrations.

Ignore rate limits at your own risk

Rate limits matter for both the API provider and you. The consequences of exceeding rate limits can be severe:

Throttling. Most servers will implement rate limits by throttling, i.e., slowing down response times for a client if you exceed the rate limit.
Temporary suspension. If the rate limit is exceeded, the server may temporarily suspend your access.
Additional charges. If you are paying for the API access, they will charge you more.
Permanent ban. In extreme cases, if you repeatedly exceed the rate limit (i.e., you hammer their API with requests), the server may permanently ban you. They will do this either by revoking your API key or banning your IP address.

So, in the best-case scenario, your response will slow. In the worst-case scenarios, you’ll either lose access or find yourself with a massive bill.

To avoid these consequences, it's important to understand and respect the rate limits set by the API provider. For the most part, these are laid out in the docs, in your account, or in the headers of each response.

For instance, one of the main integrations we have at Vessel is with HubSpot. HubSpot has an extensive guide on how exactly they want you to use the API, including information about the rate limits:

The page also gives the headers they send giving information about the users current limits, and the error messages a user will see when they pass their limit:

//Example{  "status": "error",  "message": "You have reached your daily limit.",  "errorType": "RATE_LIMIT",  "correlationId": "c033cdaa-2c40-4a64-ae48-b4cec88dad24",  "policyName": "DAILY",  "requestId": "3d3e35b7-0dae-4b9f-a6e3-9c230cbcf8dd"}

Here’s a similar page for Salesforce, and one for Close.io. You really have no excuse for not following the rules.

Use errors and headers to manage rate limits

Before we get into some of the techniques you can use to deal with rate limits, we need to know how to recognize them.

The standard HTTP status code for indicating that the user has sent too many requests in a given amount of time is 429 Too Many Requests.

Thus, this is the sad path that you have to initially test for whenever making an API request to an integration:

async function fetchData() {  const response = await fetch('https://api.github.com/users/zkirby');  if (response.status === 429) {    console.log('Received HTTP 429: Too Many Requests. Waiting before retrying.');      } else {    const data = await response.json();    console.log(data);  }}fetchData().catch(error => {  console.error('Error:', error);});

What you do next will depend on whether the API gives any additional information. If an API responds with 429, it may also include a Retry-After header indicating how long (in seconds) the client should wait before making a new request. You can then use this value to wait for the API to resume responding:

const MAX_RETRIES = 5;let retries = 0;async function makeRequest() {    try {        const response = await fetch('https://api.github.com/users/zkirby');        if (!response.ok) { // check if response failed            if (response.status === 429 && retries < MAX_RETRIES) { // Too Many Requests                retries++;                const retryAfter = response.headers.get('Retry-After'); // get the Retry-After header                console.log(`Hit rate limit. Retry attempt ${retries} after delay of ${retryAfter} seconds...`);                await new Promise(resolve => setTimeout(resolve, 1000 * parseInt(retryAfter, 10))); // Delay in milliseconds                return makeRequest(); // Recursive retry call            } else {                throw new Error(`HTTP error! status: ${response.status}`);            }        } else {            const data = await response.json();            console.log(data);        }    } catch (error) {        if (retries >= MAX_RETRIES) {            console.error('Max retries exceeded.');        } else {            console.error(`An error occurred: ${error.message}`);        }    }}makeRequest();

This function first checks whether the server's response includes a "Retry-After" header. If it does, it waits the specified number of seconds before trying the request again. If it doesn't find a "Retry-After" header or if the status code is something other than 429, it throws an error.

Not all APIs include a "Retry-After" header in their responses, so always check the API's docs to understand how it implements rate limiting.

In addition to the 429 status code, some APIs will also respond with the exact details of their rate-limiting in the headers:

X-RateLimit-Limit. This header indicates the maximum number of requests that can be made in a time window.
X-RateLimit-Remaining. This header indicates the number of requests remaining in the current rate limit window.
X-RateLimit-Reset. This header indicates when the rate limit will reset, typically provided as a timestamp.

You can use this information to adjust how you are calling the API:

async function callAPI() {    try {        const response = await fetch('https://api.github.com/users/zkirby');                // Get the rate limit headers        const limit = response.headers.get('x-ratelimit-limit');        const remaining = response.headers.get('x-ratelimit-remaining');        const resetTime = response.headers.get('x-ratelimit-reset');                console.log(`Rate limit: ${limit}`);        console.log(`Remaining: ${remaining}`);        console.log(`Reset time: ${new Date(resetTime * 1000)}`);        // If no requests remaining, wait until rate limit resets before next call        if (remaining <= 0) {            const currentTime = Math.floor(new Date().getTime() / 1000);            const waitTime = resetTime - currentTime;            if (waitTime > 0) {                console.log(`Waiting for ${waitTime} seconds until rate limit reset.`);                await new Promise(resolve => setTimeout(resolve, waitTime * 1000));            }        }        // Call the API again        callAPI();    } catch (error) {        console.error(`An error occurred: ${error.message}`);    }}callAPI();

Here we’re adjusting our setTimeout in accordance with these headers and waiting for the API.

If at first you don’t succeed, retry, retry, and retry again

What if you don’t have any more details about the rate-limiting and are just getting 429s all the way down?

Then you need to retry. With basic retries, if the server responds with 429 you’re going to just retry a MAX_RETRIES number of times before throwing an error:

const MAX_RETRIES = 5;let retries = 0;async function makeRequest() {    try {        const response = await fetch('https://api.github.com/users/zkirby');        if (!response.ok) { // check if response failed            if (response.status === 429 && retries < MAX_RETRIES) { // Too Many Requests                retries++;                console.log(`Hit rate limit. Retry attempt ${retries} after delay...`);                await new Promise(resolve => setTimeout(resolve, 1000 * retries)); // Delay in milliseconds                return makeRequest(); // Recursive retry call            } else {                throw new Error(`HTTP error! status: ${response.status}`);            }        } else {            const data = await response.json();            console.log(data);        }    } catch (error) {        if (retries >= MAX_RETRIES) {            console.error('Max retries exceeded.');        } else {            console.error(`An error occurred: ${error.message}`);        }    }}makeRequest();

Here, we use the built-in setTimeout function to create a delay. If the status is 429, we increment our retry counter, wait for a certain amount of time, and then retry the request. If we exceed the maximum number of retries or get a different kind of error, we throw an error. Here’s what you’d see in the console:

Hit rate limit. Retry attempt 1 after delay...Hit rate limit. Retry attempt 2 after delay...Hit rate limit. Retry attempt 3 after delay...Hit rate limit. Retry attempt 4 after delay...Hit rate limit. Retry attempt 5 after delay...Max retries exceeded.

This is fine, but naive. If you were making Too Many Requests a second ago, you probably still are now. Constantly battering an API like this is bad form, and exactly the behavior that will lead to throttling or suspension.

The best option for rate limiting: exponential backoff

A better option is a technique known as exponential backoff. Exponential backoff is a strategy used to space out repeated reattempts in the face of API failures. The principle is to progressively lengthen the wait time between retries, multiplying the delay by a set factor each time, to reduce the load on the system and increase the chances of subsequent attempts succeeding.

Here’s how you can implement exponential backoff:

const MAX_RETRIES = 5;let retries = 0;async function makeRequest() {    try {        const response = await fetch('https://api.github.com/users/zkirby');        if (!response.ok) { // check if response failed            if (response.status === 429 && retries < MAX_RETRIES) { // Too Many Requests                retries++;                const delay = Math.pow(2, retries) * 1000; // calculate delay: 2^retries seconds                console.log(`Hit rate limit. Retry attempt ${retries} after delay of ${delay / 1000} seconds...`);                await new Promise(resolve => setTimeout(resolve, delay)); // Delay in milliseconds                return makeRequest(); // Recursive retry call            } else {                throw new Error(`HTTP error! status: ${response.status}`);            }        } else {            const data = await response.json();            console.log(data);        }    } catch (error) {        if (retries >= MAX_RETRIES) {            console.error('Max retries exceeded.');        } else {            console.error(`An error occurred: ${error.message}`);        }    }}makeRequest();

On every retry we are increasing our delay by a power of two, so the delay gets progressively longer each time:

Hit rate limit. Retry attempt 1 after delay of 2 seconds...Hit rate limit. Retry attempt 2 after delay of 4 seconds...Hit rate limit. Retry attempt 3 after delay of 8 seconds...Hit rate limit. Retry attempt 4 after delay of 16 seconds...Hit rate limit. Retry attempt 5 after delay of 32 seconds...Max retries exceeded.

With this technique we aren’t hammering the server every second. Instead, we’re calling it less and less, until we’re below our rate limit.

The Thundering Herd problem

Exponential backoff is a good approach to gracefully dealing with rate limits. But the implementation above has a huge flaw–the thundering herd problem.

Imagine you’re an API that has some downtime briefly–just a few seconds. But during those few seconds, all your consumers started getting errors. Being good API consumers they had exponential backoff implemented. The problem is that as all their errors started at the same time, all their backoff delays become synced. Every consumer will try again 2, 4, 8, and so on seconds after that initial error. These synchronized retries from multiple clients will overload the system stopping it from coming back online. Hence, a “thundering herd.”

The answer here is to combine exponential backoff with a random "jitter" factor to avoid syncing across clients:

const MAX_RETRIES = 5;let retries = 0;async function makeRequest() {    try {        const response = await fetch('https://api.github.com/users/zkirby');        if (!response.ok) { // check if response failed            if (response.status === 429 && retries < MAX_RETRIES) { // Too Many Requests                retries++;                const delay = Math.pow(2, retries) * 1000; // calculate delay: 2^retries seconds                const randomDelay = delay / 2 + Math.random() * delay / 2; // random delay between delay/2 and delay                console.log(`Hit rate limit. Retry attempt ${retries} after delay of ${randomDelay / 1000} seconds...`);                await new Promise(resolve => setTimeout(resolve, randomDelay)); // Delay in milliseconds                return makeRequest(); // Recursive retry call            } else {                throw new Error(`HTTP error! status: ${response.status}`);            }        } else {            const data = await response.json();            console.log(data);        }    } catch (error) {        if (retries >= MAX_RETRIES) {            console.error('Max retries exceeded.');        } else {            console.error(`An error occurred: ${error.message}`);        }    }}makeRequest();

In this code, randomDelay is a random value between delay/2 and delay. The random factor helps spread out the retry attempts from different clients:

Hit rate limit. Retry attempt 1 after delay of 1.083289635486404 seconds...Hit rate limit. Retry attempt 2 after delay of 2.6446724585140426 seconds...Hit rate limit. Retry attempt 3 after delay of 5.228619593132387 seconds...Hit rate limit. Retry attempt 4 after delay of 13.48560209454581 seconds...Hit rate limit. Retry attempt 5 after delay of 31.17508140133349 seconds...Max retries exceeded.

Some version of the code above implementing exponential backoff with jitter should be boilerplate for any API you are integrating with.

Supporting multiple APIs

As soon as you call one API, you’ll want to call two. Modern programming is API-based so you are often fetching data from multiple sources. If you are building out integrations for your software, you might have to integrate with dozens of endpoints.

For instance, if you need to integrate with a wide range of go-to-market tools, you’ll have a HubSpot integration, an Outreach integration, a Pipedrive integration, a Zoho integration, and so on. A fundamental aspect of programming is DRY–don’t repeat yourself. You don’t want to have to write the above retry logic for each of these integrations. You want to write one that can handle everything.

This is challenging (it’s the challenge we set up Vessel to solve). Though it would be nice if every API returned the same errors and headers, they don’t. For instance:

Salesforce has a “Sforce-Limit-Info” header formatted: api-usage=10018/100000"
HubSpot responds with "X-HubSpot-RateLimit-Daily" and "X-HubSpot-RateLimit-Daily-Remaining"
Close.io responds with "x-rate-limit-limit" and "x-rate-limit-remaining".

How do we deal with all of these together? To deal with varying rate limit headers across different APIs, you can define a set of known rate limit header patterns and then search for these patterns in the response headers. Here's how you could structure this:

const rateLimitParsers = {  'x-ratelimit-remaining': (headers) => parseInt(headers.get('x-ratelimit-remaining')),  'sforce-limit-info': (headers) => {    const match = /api-usage=(\d+)\/(\d+)/.exec(headers.get('sforce-limit-info'));    return match ? parseInt(match[1]) : null;  },  'x-hubspot-ratelimit-daily-remaining': (headers) => parseInt(headers.get('x-hubspot-ratelimit-daily-remaining')),  // Add more patterns here...};async function getRateLimitInfo(response) {  for (const pattern in rateLimitParsers) {    if (response.headers.has(pattern)) {      return rateLimitParsers[pattern](response.headers);    }  }  console.warn('Could not parse rate limit info from headers');  return null;}async function callApiWithRetry(url, options, retries = 3) {  try {    const response = await fetch(url, options);    if (!response.ok) {      const remainingRequests = await getRateLimitInfo(response);      console.log(`Remaining requests: ${remainingRequests}`);      if (response.status === 429 || remainingRequests === 0) {        if (retries > 0) {          console.log(`Rate limit exceeded. Retrying ${retries} more times...`);          // Exponential backoff with jitter          await new Promise(resolve =>            setTimeout(resolve, (Math.pow(2, retries) + Math.random()) * 1000)          );          return await callApiWithRetry(url, options, retries - 1);        } else {          throw new Error('Rate limit exceeded. No more retries left.');        }      } else {        throw new Error(`API request failed: ${await response.text()}`);      }    }    return await response.json();  } catch (error) {    console.error(`Failed to call API: ${error.message}`);    throw error;  }}

The getRateLimitInfo function looks for known rate limit header patterns in the response headers and, if it finds a match, uses the corresponding function to extract the rate limit information.

This approach provides a flexible way to handle varying rate limit headers. You can easily add more patterns to the rateLimitParsers dictionary as you encounter new ones.

Being a good API consumer

If you are building integrations with an API it’s because that API and the data it returns is important to you.. Use it responsibly and efficiently. Here are some key principles:

Respect rate limits. API providers impose rate limits to ensure fair usage and protect the server from being overwhelmed. As a consumer, you should ensure your application respects these limits and implements proper backoff strategies when limits are reached.
Understand the API. Read the API documentation thoroughly. Understand what each endpoint does, what parameters it accepts, what responses it might give, and how errors are handled. Make sure you understand how to authenticate your requests and any limits or quotas that apply.
Handle errors gracefully. Implement proper error handling in your application. If the API returns an error, your application should handle it in a way that does not lead to further problems. This includes scenarios like retrying requests in case of temporary issues.
Efficient use of requests. Make requests as efficient as possible. If the API supports it, use methods like pagination, filtering, or field selection to minimize the amount of data transferred. Avoid unnecessary requests.
Be prepared for downtime. Even the best APIs can have downtime. Ensure your application can handle these periods gracefully, providing appropriate feedback to users and maintaining functionality where possible.

This does require more work on your end. Above we’ve gone from basically a single line fetch to dozens of lines with error handling, backoff algorithms, and other logic. But this is also better for you. You’ll get better, more efficient, more consistent responses from the API making it easier to build the rest of your application, and ultimately making your application better for your users.