I am starting a series of self-reference materials on staples of web application delivery. A place to collect and document my learnings and understanding of various technologies I often reach for when designing systems. The first in this series was about Ruby and Redis While this one will focus in on Content Delivery Networks. These reference pages will be updated over time and evolve as my usage or focus changes.
A Content Delivery Network can help serve high traffic or high performance websites as well as offer a number of features.
A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today… – wikipedia
Often you might want different configurations and settings for different purposes and uses of CDNs. For example, it is fairly common to have an asset CDN with long-lived caches for assets, while you might want applications to be able to specify find grained caching headers for HTML and API content.
It can help even if you aren’t quite ready to fully leverage it. CDNs have a lot of hidden advantages, and with some early setup can be very quickly utilized to handle insane amounts of traffic. Handling loads of largest advertising campaigns for the initial experience even if all those new signups have to be pushed into a queue to handle when you can. As this tweet points out even after folks think of all the obvious reasons to have CDNs, there are often many other clever ways to use CDNs.
There are a bunch of benefits to putting a CDN on top of your web application, such as reduced TTFB, caching, and DDoS protection. But there's also a huge benefit that people don't talk about: No user-side DNS caching. You can switch origins within minutes, not days/weeks/months.— Jack Ellis (@JackEllis) April 3, 2022
My first startup got a viral link taking it down and by the time we got page caching on our rails server a few hours had passed and we lost the bump… For this first bit of traction, a CDN could have made all the difference. The content would have been easy to protect with the single viral page behind the CDN and appropriate content headers. That startup never succeeded, but early on much time was wasted trying to improve app layer caching and performance where a CDN could have been a big help.
While CDNs are great, nothing is free and every abstraction adds some complexity to your system. Understanding the values it can provide and also understanding some of the gotchas cna help your team decide if it is the right decision for the system.
I have written previously about request depth and availabiltiy. A CDN is an additional layer in the request depth of your application stack… It also only has a 99.9% AWS SLA, as the entry point to your systems which means that is the upper limit on your overall SLA pretending for a moment everything else was 100% reliable. Now, AWS and most CDNs actually have far better real-world uptime and success rates during normal operations, but all of the major CDNs have had notable major outages. Including Cloudfront having a major outage the day before thanksgiving. There are a few protective measures one can take a team really needs to ensure higher availability of their site, neither suggestion is cheap.
* A team can implement and support High Availability Origin Failover as a protective layer within Cloudfront, to protect against origin level failures. * A team can implement multi-CDN DNS failover This can be done in an automated or manual fashion depending on complexity and cost concerns.
These days CDNs can do a lot of what load balancers do, by routing different types of requests to different back-end “origin” servers. Geo target routing to support locale and nearest reach servers… While this overlaps with load balancers, I generally end up with a configuration where I have both a CDN and load balancers in place. My setup looks often looks like so:
web browser -> CDN -> ALB -> Application Servers.
As part of the ability to route different requests, CDNs now often handle basic IP Address Geolocation, allowing one to route requests from different countries, language support, city, postal code, or more to different origins. Even if your application doesn’t use different origins, the Geolocation information is often extremely useful. You can avoid additional network calls or 3rd party integrations to leverage the CDN’s built-in geolocation support. For example, AWS Cloudfront can provide Geolocation info on all requests to the CDN to impact caching logic as well as header hints to your application servers. For example, it is helpful to use this data to detect if a user might be accessing your site on the wrong domain given the country they are in. Example additional header data.
CloudFront-Viewer-Country-Name: United States CloudFront-Viewer-Country-Region: MI CloudFront-Viewer-Country-Region-Name: Michigan CloudFront-Viewer-City: Ann Arbor CloudFront-Viewer-Postal-Code: 48105 CloudFront-Viewer-Time-Zone: America/Detroit CloudFront-Viewer-Latitude: 42.30680 CloudFront-Viewer-Longitude: -83.70590 CloudFront-Viewer-Metro-Code: 505
Note: inside a Ruby app, Rails/Rack will format the headers adding in
HTTP and upcasing, so you could access this data in a typical Rails app like so