When A CloudFront Origin Must Fail for Testing High Availability

Amazon's CloudFront service does its best to speed up content delivery by caching frequently accessed files and it is obvious the service does a superb job of this by attracting an extensive list of customers. But what happens when you turn on the prescribed failover solution for redundancy and need to test its seemingly simple implementation? Get ready for a deeper dive into CloudFront's Origin Groups.

Application Set-Up

To understand the problem at hand, it is best to set up the use case as it was intended (and before mentioning anything about failover). The Ippon Podcast exploratory project is possible through a combination of CloudFront and S3. The application's index.html, styles, scripts, and media files are all located in us-east-1 S3 buckets with the property for "Static website hosting" enabled. The site is served via a CloudFront distribution concerning these S3 buckets as Origins. So what happens to the site if a bucket is deleted or the files are removed? What about if the entire us-east-1 region goes down?

Failure to retrieve these files will result in 4xx error codes since they will either be considered not found or forbidden. Enabling the "Versioning" property or "Cross-Region Replication" management of these buckets can help ensure the buckets and their files survive; but, the CloudFront distribution still needs a way to retrieve those files in the event of a region shutdown.

Prescription Failover

In the event of a cache miss at an edge location, CloudFront goes to retrieve the file and replenish the cache. So then how does it work if the origin is unavailable at the time CloudFront goes looking for it? Fortunately, support for Origin Failover was introduced in November 2018, which was another important advancement in making AWS-based web applications highly available. Origin Failover is achievable through the ability to specify a primary and a secondary origin into what is called an Origin Group. When CloudFront is unsuccessful in connecting to the primary origin, an error status code is returned which prompts the failover action. CloudFront will then attempt the same request with the secondary origin. This is customizable in that any combination of the following status codes can be selected: 500, 502, 503, 504, 403, or 404.

Administer the Prescription

In its current state, the Ippon Podcast site relies on S3 buckets located in the us-east-1 region. Proper failover cannot be implemented without redundant S3 buckets in another region like us-west-2. Setup a cross-regional bucket in either of the following ways:

  • Enable "Cross-Region Replication" on the us-east-1 bucket and specify the creation of a new S3 bucket. This will force the "Versioning" property to be enabled on both buckets, and also requires the objects to be copied or synced between the original us-east-1 bucket and the new us-west-2 bucket, but going forward the latest files will automatically be replicated.
  • Make or repurpose an S3 bucket in us-west-2 and copy or sync the objects over from the original us-east-1 bucket. Then, enable "Cross-Region Replication" on the us-east-1 bucket and specify the new S3 bucket which was just created in us-west-2. This will still force the "Versioning" property to be enabled on both buckets and automatically going forward the latest files will be replicated.

Note that "Cross-Region Replication" is not required but, without enabling, object consistency between the two buckets will require manual intervention. Using the AWS CLI for better control over S3 operations and an infrequently accessed storage class for a new, cost-effective bucket is recommended. Possible commands might include:

aws s3 mb s3://mybucket_west --region us-west-2
aws s3 cp s3://mybucket s3://mybucket_west --recursive
aws s3 sync s3://mybucket s3://mybucket_west

Returning to the CloudFront distribution, set up the behavior with a new Origin Group through the AWS console with the following steps:

  1. Under the Origins and Origin Groups tab, enter the new S3 bucket in us-west-2's information through the Create Origin interface. Make sure to specify "Restrict Bucket Access" as "Yes" and allow the grant read permission to the desired Origin Access Identity. There should now be at least two origins listed.
  2. Moving on to the Create Origin Group interface, from the "Origins *" drop-down, add the two S3 bucket origins in order of request priority. The 4xx status codes are necessary but feel free to select any of the errors as "Failover criteria *". Hit Create and confirm the Origin Group is now listed.
  3. The final step under the Behaviors tab is to replace the intended behavior's origin field that was previously using just the single S3 bucket in us-east-1 for the new origin group.

Or, set up the behavior with a new Origin Group programmatically:

  1. Retrieve the current "DistributionConfig" member using the get-distribution-config command. Save the "DistributionConfig" object to its own file and store the ETag attribute's value for reference as both will be necessary later to update the distribution.
aws cloudfront get-distribution-config --region us-east-1 --id EDFDVBD632BHDS5
  1. Open the saved DistributionConfig file and modify its "Origins" object with the new S3 bucket in us-west-2's information to create a new "Origin".
"Origins": {
    "Quantity": 2,
    "Items": [
        {
            "Id": "originalHostingS3Bucket",
            "DomainName": "mybucket.s3.amazonaws.com",
            "OriginPath": "",
            "CustomHeaders": {
                "Quantity": 0
            },
            "S3OriginConfig": {
                "OriginAccessIdentity": "origin-access-identity/cloudfront/_ID-of-origin-access-identity_"
            }
        },
        {
            "Id": "newHostingS3Bucket_west",
            "DomainName": "mybucket_west.s3.amazonaws.com",
            "OriginPath": "",
            "CustomHeaders": {
                "Quantity": 0
            },
            "S3OriginConfig": {
                "OriginAccessIdentity": "origin-access-identity/cloudfront/_ID-of-origin-access-identity_"
            }
        }
    ]
}
  1. Now modify the file to include an "OriginGroups" attribute if there is not one. Add an "OriginGroup" with the origins that were added previously. Specify any combination of 403, 404, 500, 502, 503, or 504 "StatusCodes" which CloudFront will attempt to connect with the secondary origin.
"OriginGroups": {
    "Quantity": 1,
    "Items": [
        {
            "Id": "OriginGroup-hostingS3Bucket",
            "FailoverCriteria": {
                "StatusCodes": {
                    "Quantity": 6,
                    "Items": [
                        403,
                        404,
                        500,
                        502,
                        503,
                        504
                    ]
                }
            },
            "Members": {
                "Quantity": 2,
                "Items": [
                    {
                    "OriginId": "originalHostingS3Bucket"
                    },
                    {
                    "OriginId": "newHostingS3Bucket_west"
                    }
                ]
            }
        }
    ]
}
  1. Last part to modify is the Behavior with the new Origin Group. Update the "DefaultCacheBehavior" and/or "CacheBehaviors" members with the file. When using Origin Groups for Behaviors, the "AllowedMethods" are only "HEAD", "GET", and "OPTIONS".
"DefaultCacheBehavior": {
    "TargetOriginId": "OriginGroup-hostingS3Bucket",
    "ForwardedValues": {
        "QueryString": false,
        "Cookies": {
            "Forward": "none"
        },
        "Headers": {
            "Quantity": 0
        },
        "QueryStringCacheKeys": {
            "Quantity": 0
        }
    },
    "TrustedSigners": {
        "Enabled": false,
        "Quantity": 0
    },
    "ViewerProtocolPolicy": "redirect-to-https",
    "MinTTL": 60,
    "AllowedMethods": {
        "Quantity": 3,
        "Items": [
            "HEAD",
            "GET",
            "OPTIONS"
        ],
        "CachedMethods": {
            "Quantity": 2,
            "Items": [
                "HEAD",
                "GET"
            ]
        }
    },
    "SmoothStreaming": false,
    "DefaultTTL": 86400,
    "MaxTTL": 31536000,
    "Compress": true,
    "LambdaFunctionAssociations": {
        "Quantity": 0
    },
    "FieldLevelEncryptionId": ""
}
  1. Confirm the DistributionConfig JSON file is saved with the changes just made. Then run the update-distribution command using that file's location. Make sure to provide the --if-match flag with the "ETag" value returned in the first step's command. The updated DistributionConfig JSON should be echoed back if successful.
aws cloudfront update-distribution --if-match E2QWRUHEXAMPLE --id EDFDVBD632BHDS5 --distribution-config file://origin-failover.json --region us-east-1

And with that, the application should now be better equipped for handling failures! So how can that be tested?

Testing... Testing... Redundancy

At least in the case of the Ippon Podcast, there are two buckets utilized in its functionality. One is for holding the index.html, styles, and scripts of the application. The other contains the podcast audio recordings. This allows for two different opportunities to test the origin group's failover capability. Just a note, if any of the following described does not reflect immediately, it is because it takes time for the changes to propagate. Invalidating the cache using /* will allow differences to take effect more quickly but may have a cost involved if performing over 1,000 in a month.

Removing the index.html is an easy one to test with because the site will break completely. To see this, perform the following steps:

  1. Revert the associated behavior back to the single origin
  2. Delete the index.html file from this primary bucket (have no fear about losing the file since it will still be versioned within the bucket and also located in the alternate)
  3. Hard refresh the page and the site should come to a screeching halt
  4. Open the browser's developer tools to confirm the expected 4xx status code through the Network tab
  5. Back in CloudFront, change the behavior to use the origin group again so that failure will be redirected to the alternate S3 bucket that contains the backup index.html file
  6. Hard refresh the page once more and the site should spring back to life!

A similar test can be performed on the media files. While this will not break the site completely, reverting the associated behavior to the single origin and then deleting one or any of the files should cause failure to load/play. Changing the behavior over to use the origin group will allow the backup file to be loaded and played as intended.

Remember to restore removed primary files either by deleting the "Delete marker" under show versions or re-upload the file. Ensure that any changed behaviors use the origin groups.

Debrief

The Ippon Podcast site is still very much a work-in-progress. Since it is more exploratory, high availability is not as critical. This makes it a good application to test proper failover through CloudFront's origin groups. There are more subtle details to utilizing origin groups which require some exploration. Yet, hopefully, this provided an easy first step towards building and testing redundancy in a CloudFront/S3-hosted application.

Sources: