Retry Pattern ๐Ÿ”

ยท3 min read

Introduction

APIs or services can sometimes fail due to transient errors. These errors are temporary and often self-correcting, such as temporary service unavailability (HTTP 503) or rate limits (HTTP 429). Retrying the action after a brief delay can often resolve these issues.

In a microservices architecture or when client applications call RESTful APIs, it's essential to build for failure. Designing your system with robust failure-handling mechanisms ensures greater resilience and reliability.

When to use

  • Operations should be idempotent when you use the retry pattern.

Idempotent: No matter how many times you call the operation, the result will be the same. In REST APIs, GET, HEAD and OPTIONS methods generally do not change the resource state on the server and hence are mostly retryable. And PUT and DELETE also idempotent because once called, No matter how many times you call it will not change outcome.

  • Temporary network issues.
  • Temporarily unavailable at server side.

Different type of Retry Machanism

1. Manual retries -

Manual retries occur when the user manually triggers a retry after a failure. This is common in user interfaces where a user clicks a "Retry" button after an operation fails.

2. Retries with fixed delay -

Retries with a fixed delay retry the operation after a set interval.

  • If many clients are retrying at the same interval, it can lead to a thundering herd problem, where many clients send requests simultaneously, overwhelming the server.
  • It does not adapt to the error context, potentially leading to inefficient retry intervals.

3. Retries with Exponential Backoff

Retries with exponential backoff address some of the limitations of fixed delay retries:

  • By increasing the delay exponentially, it reduces the likelihood of overwhelming the server with repeated requests.
  • It adapts to persistent failures by progressively increasing the delay between retries, allowing more time for transient issues to resolve.

4. Retries with Exponential Backoff and Jitter

Retries with exponential backoff and jitter address additional limitations:

  • Even with exponential backoff, synchronized retries can still occur, potentially leading to a thundering herd problem.
  • Adding jitter introduces randomness to the delay intervals, further reducing the likelihood of synchronized retries and evenly spreading out the retry attempts over time.

By using exponential backoff with jitter, the retry mechanism is more robust and less likely to cause server overload due to synchronized retry attempts, making it more suitable for handling large-scale systems with many clients.

Implementation of Exponential Backoff and Jitter

Total Retries The number of maximum retries you want before returning a failed response to the client.

Retry Status Codes The HTTP status codes that you want to retry for. By default, we have kept it on for all status codes >=500.

Backoff This is the minimum time we have to wait while sending any subsequent retry request.

axios.interceptors.response.use(
  async (error) => {
 
    const statusCode = error.response.status
    const currentRetryCount = error.response.config.currentRetryCount ?? 0
    const totalRetry = error.response.config.retryCount ?? 0
    const retryStatusCodes = error.response.config.retryStatusCodes ?? []
    const backoff = error.response.config.backoff ?? 100
 
    if(isRetryRequired({
      statusCode, 
      retryStatusCodes, 
      currentRetryCount, 
      totalRetry})
    ){
 
      error.config.currentRetryCount = 
          currentRetryCount === 0 ? 1 : currentRetryCount + 1;
 
     // Create a new promise with exponential backoff
     const backOffWithJitterTime = getTimeout(currentRetryCount,backoff);
     const backoff = new Promise(function(resolve) {
          setTimeout(function() {
              resolve();
          }, backOffWithJitterTime);
      });
 
      // Return the promise in which recalls Axios to retry the request
      await backoff;
      return axios(error.config);
 
    }
  }
);
 
function isRetryRequired({
  statusCode, 
  retryStatusCodes, 
  currentRetryCount, 
  totalRetry}
 ){
 
  return (statusCode >= 500 || retryStatusCodes.includes(statusCode))
          && currentRetryCount < totalRetry;
}
 
function getTimeout(numRetries, backoff) {
  const waitTime = Math.min(backoff * (2 ** numRetries));
 
  // Multiply waitTime by a random number between 0 and 1.
  return Math.random() * waitTime;
}

While making an Axios request you have to make sure to add the variables in the request configurations:

const axios= require('axios');
const sendRequest= async () => {
      const requestConfig = {
             method: 'post',
             url: 'api.example.com',
             headers: { 
                'authorization': 'xxx',
              },
              data: payload,
              retryCount : 3,
              retryStatusCodes: ['408', '429'],
              backoff: 200,
              timeout: 5000
          };
      const response = await axios(requestConfig);
      return response.data;
}

Code Reference

Who uses exponential backoff and jitter

  • Amazon Services (Any api that you use of AWS)
  • Redis has build in retry machanism.

Thankyou

I hope you found it helpful. Comments or corrections are always welcome. Happy Coding ๐Ÿ™‚

References:

  1. Amazon Exponential Backoff And Jitter
  2. Building Resilient Systems: Retry Pattern in Microservices