Article summary
Recently, I wrote about using Varnish Cache to speed up websites. However, not all websites appear identically on all devices. For example, many web applications will deliver different content to mobile devices such as phones, tablets, screen-readers, etc. What happens when Varnish receives a request for a resource from one of these devices?
Without additional configuration, Varnish will return the only version of a resource that it has cached for a particular URL — regardless of its appropriateness for the device performing the request.
This can be problematic. For example, if a mobile phone performs the first request for a resource, the request may return specific mobile content, and Varnish will likely cache it. However, if a desktop browser subsequently performs a request for the same resource, it may receive the mobile content that Varnish has cached. This could cause mobile-specific content to appear in the desktop browser.
Why Varnish Has Mobile Trouble
The situation arises from the mechanism Varnish utilizes to manage cached resources. When Varnish caches a resource, it creates a unique hash from the parameters of the request used to fetch the resource. The hash identifies the resource in the Varnish cache.
When Varnish receives subsequent requests, it hashes select request parameters and attempts to use that hash to match up the request with a resource in the cache. If the hash corresponds with a valid resource, Varnish returns it (a cache hit); if it does not, Varnish fetches it from the back end (a cache miss).
Most web applications utilize the User-Agent
header of HTTP requests to determine what content to return: mobile-specific content, or normal desktop content. However Varnish does not, by default, include the User-Agent
header as a parameters in hashing requests to identify resources. This explains why Varnish, without additional configuration, returns the same content to different devices, even if the backend web application does not.
Adding User-Agent to Varnish Cache
To fix the problem, Varnish needs to be configured to include the User-Agent
header when hashing parameters of a request. However, it would be inappropriate to simply add the entire User-Agent
string as a parameter to the hash function. The number of different User-Agent
s would cause Varnish to cache identical resources separately for each request with a different User-Agent
. This would cause the size of the cache to grow dramatically, and decrease performance.
Instead, it would be more appropriate to classify User-Agent
s, and then cache resources based on this classification.
The following VCL code snippet demonstrates how to classify a request by device type based on User-Agent
. The X-Device
header stores the device classification for later use.
# Routine to identify and classify a device based on User-Agent
sub identify_device {
# Default to classification as a PC
set req.http.X-Device = "pc";
if (req.http.User-Agent ~ "iPad" ) {
# The User-Agent indicates it's a iPad - so classify as a tablet
set req.http.X-Device = "mobile-tablet";
}
elsif (req.http.User-Agent ~ "iP(hone|od)" || req.http.User-Agent ~ "Android" ) {
# The User-Agent indicates it's a iPhone, iPod or Android - so let's classify as a touch/smart phone
set req.http.X-Device = "mobile-smart";
}
elsif (req.http.User-Agent ~ "SymbianOS" || req.http.User-Agent ~ "^BlackBerry" || req.http.User-Agent ~ "^SonyEricsson" || req.http.User-Agent ~ "^Nokia" || req.http.User-Agent ~ "^SAMSUNG" || req.http.User-Agent ~ "^LG") {
# The User-Agent indicates that it is some other mobile devices, so let's classify it as such.
set req.http.X-Device = "mobile-other";
}
}
Now that the request has been classified based on device, the classification needs to be added to the specialized vcl_hash
function that Varnish executes to create the hash used to identify resources in the cache. Recall that VCL operates by modifying default behavior — when defining the vcl_hash
function, only the new behavior needs to be specified:
sub vcl_hash {
# If the device has been classified as any sort of mobile device, include the User-Agent in the hash
# However, do not do this for any static assets as our web application returns the same ones for every device.
if (!(req.url ~ ".(gif|jpg|jpeg|swf|flv|mp3|mp4|pdf|ico|png|gz|tgz|bz2)(?.*|)$")) {
hash_data(req.http.X-Device);
}
}
Now that requests from devices have been classified, and this classification has been added to the hash function, the identify_device
sub-routine must be called when Varnish receives a potentially cacheable request:
sub vcl_recv {
# Be sure to actually call our sub-routine to classify devices!
call identify_device;
if (req.http.Accept-Encoding) {
if (req.url ~ ".(gif|jpg|jpeg|swf|flv|mp3|mp4|pdf|ico|png|gz|tgz|bz2)(?.*|)$") {
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
remove req.http.Accept-Encoding;
}
}
if (req.url ~ ".(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(?.*|)$") {
unset req.http.cookie;
set req.url = regsub(req.url, "?.*$", "");
}
if (req.http.cookie) {
if (req.http.cookie ~ "(mycookie_|web-app-1-|special-identifier)") {
return(pass);
} else {
unset req.http.cookie;
}
}
set req.grace = 120s;
}
Now Varnish can serve different resources from its cache appropriately to various devices, just as the backend web application intends. Similar configuration changes can be made to cache resources differently for any number of request paramaters.
Nice summary on how to easily do Device Detection in Varnish. Keep posting on your Varnish experience as you go along :-)
A couple of remarks. Please have a look at the docs:
https://www.varnish-cache.org/docs/trunk/users-guide/devicedetection.html
As for a VCL set that recognizes far more devices than your snippet (and is community maintained):
https://github.com/varnish/varnish-devicedetect/
Hi Ruben,
Thanks very much for the resources. The device detection snippet will be very helpful, and I’ll definitely be incorporating it in our configurations.
– Justin
Just can’t seem to get this to work. When I add the code above it fails on restart. Maybe you can see the problem. Here’s my VCL
# This is a basic VCL configuration file for varnish. See the vcl(7)
# man page for details on VCL syntax and semantics.
#
# Default backend definition. Set this to point to your content
# server.
#
backend default {
.host = “127.0.0.1”;
.port = “8080”;
.connect_timeout = 600s;
.first_byte_timeout = 1200s;
.between_bytes_timeout = 600s;
.max_connections = 2800;
}
acl purge {
“localhost”;
}
sub vcl_recv {
set req.grace = 10m;
# Set X-Forwarded-For header for logging in nginx
remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;
# Remove has_js and CloudFlare/Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, “(^|;\s*)(_[_a-z]+|has_js)=[^;]*”, “”);
# Remove a “;” prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, “^;\s*”, “”);
# Either the admin pages or the login
if (req.url ~ “/wp-(login|admin|cron)”) {
# Don’t cache, pass to backend
return (pass);
}
# bypass export CSV , you can replace your url pattern with /exportlimit/all/
if ( req.url ~ “.*/exportlimit/all/.*” ) {
set req.http.connection = “close”;
return(pipe);
}
# Remove the wp-settings-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, “wp-settings-1=[^;]+(; )?”, “”);
# Remove the wp-settings-time-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, “wp-settings-time-1=[^;]+(; )?”, “”);
# Remove the wp test cookie
set req.http.Cookie = regsuball(req.http.Cookie, “wordpress_test_cookie=[^;]+(; )?”, “”);
# Static content unique to the theme can be cached (so no user uploaded images)
# The reason I don’t take the wp-content/uploads is because of cache size on bigger blogs
# that would fill up with all those files getting pushed into cache
if (req.url ~ “wp-content/themes/” && req.url ~ “\.(css|js|png|gif|jp(e)?g)”) {
unset req.http.cookie;
}
# Even if no cookies are present, I don’t want my “uploads” to be cached due to their potential size
if (req.url ~ “/wp-content/uploads/”) {
return (pass);
}
# Check the cookies for wordpress-specific items
if (req.http.Cookie ~ “wordpress_” || req.http.Cookie ~ “comment_”) {
# A wordpress specific cookie has been set
return (pass);
}
# allow PURGE from localhost
if (req.request == “PURGE”) {
if (!client.ip ~ purge) {
error 405 “Not allowed.”;
}
return (lookup);
}
# Force lookup if the request is a no-cache request from the client
if (req.http.Cache-Control ~ “no-cache”) {
return (pass);
}
if (req.http.user-agent ~ “(iphone|android|240×320|400×240|avantgo|blackberry|blazer|cellphone|danger|docomo|elaine/3.0|eudoraweb|googlebot-mobile|hiptop|iemobile|kyocera/wx310k|lg/u990|midp-2.|mmef20|mot-v|netfront|newt|nintendo wii|nitro|nokia|opera mini|palm|playstation portable|portalmmm|proxinet|proxinet|sharp-tq-gx10|shg-i900|small|sonyericsson|symbian os|symbianos|ts21i-10|up.browser|up.link|webos|windows ce|winwap|yahooseeker/m1a1-r2d2|palmsource)”) { set req.http.host = “theimproper.com”;}
# Try a cache-lookup
return (lookup);
}
sub vcl_fetch {
#set obj.grace = 5m;
set beresp.grace = 2m;
}
sub vcl_hit {
if (req.request == “PURGE”) {
purge;
error 200 “Purged.”;
}
}
sub vcl_miss {
if (req.request == “PURGE”) {
purge;
error 200 “Purged.”;
}
}
# Drop any cookies sent to WordPress.
sub vcl_recv {
if (!(req.url ~ “wp-(login|admin)”)) {
unset req.http.cookie;
}
}
# Drop any cookies WordPress tries to send back to the client.
sub vcl_fetch {
if (!(req.url ~ “wp-(login|admin)”)) {
unset beresp.http.set-cookie;
}
}
I tried adding you solution to my default-vcl. But I can’t seem to get it to work. Here is my default.vcl. Can you pin-point any errors? Many thanks for your help.
backend default {
.host = “127.0.0.1”;
.port = “8080”;
.connect_timeout = 600s;
.first_byte_timeout = 1200s;
.between_bytes_timeout = 600s;
.max_connections = 2800;
}
acl purge {
“localhost”;
}
# Routine to identify and classify a device based on User-Agent
sub identify_device {
# Default to classification as a PC
set req.http.X-Device = “pc”;
if (req.http.User-Agent ~ “iPad” ) {
# The User-Agent indicates it’s a iPad – so classify as a tablet
set req.http.X-Device = “mobile-tablet”;
}
elsif (req.http.User-Agent ~ “iP(hone|od)” || req.http.User-Agent ~ “Android” ) {
# The User-Agent indicates it’s a iPhone, iPod or Android – so let’s classify as a touch/smart phone
set req.http.X-Device = “mobile-smart”;
}
elsif (req.http.User-Agent ~ “SymbianOS” || req.http.User-Agent ~ “^BlackBerry” || req.http.User-Agent ~ “^SonyEricsson” || req.http.User-Agent ~ “^Nokia” || req.http.User-Agent ~ “^SAMSUNG” || req.http.User-Agent ~ “^LG”) {
# The User-Agent indicates that it is some other mobile devices, so let’s classify it as such.
set req.http.X-Device = “mobile-other”;
}
sub vcl_hash {
# If the device has been classified as any sort of mobile device, include the User-Agent in the hash
# However, do not do this for any static assets as our web application returns the same ones for every device.
if (!(req.url ~ “.(gif|jpg|jpeg|swf|flv|mp3|mp4|pdf|ico|png|gz|tgz|bz2)(?.*|)$”)) {
hash_data(req.http.X-Device);
}
sub vcl_recv {
# Be sure to actually call our sub-routine to classify devices!
call identify_device;
if (req.http.Accept-Encoding) {
if (req.url ~ “.(gif|jpg|jpeg|swf|flv|mp3|mp4|pdf|ico|png|gz|tgz|bz2)(?.*|)$”) {
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ “gzip”) {
set req.http.Accept-Encoding = “gzip”;
} elsif (req.http.Accept-Encoding ~ “deflate”) {
set req.http.Accept-Encoding = “deflate”;
} else {
remove req.http.Accept-Encoding;
}
}
if (req.url ~ “.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(?.*|)$”) {
unset req.http.cookie;
set req.url = regsub(req.url, “?.*$”, “”);
}
if (req.http.cookie) {
if (req.http.cookie ~ “(mycookie_|web-app-1-|special-identifier)”) {
return(pass);
} else {
unset req.http.cookie;
}
}
set req.grace = 120s;
}
}
# Set X-Forwarded-For header for logging in nginx
remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;
# Remove has_js and CloudFlare/Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, “(^|;\s*)(_[_a-z]+|has_js)=[^;]*”, “”);
# Remove a “;” prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, “^;\s*”, “”);
# Either the admin pages or the login
if (req.url ~ “/wp-(login|admin|cron)”) {
# Don’t cache, pass to backend
return (pass);
}
# bypass export CSV , you can replace your url pattern with /exportlimit/all/
if ( req.url ~ “.*/exportlimit/all/.*” ) {
set req.http.connection = “close”;
return(pipe);
}
# Remove the wp-settings-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, “wp-settings-1=[^;]+(; )?”, “”);
# Remove the wp-settings-time-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, “wp-settings-time-1=[^;]+(; )?”, “”);
# Remove the wp test cookie
set req.http.Cookie = regsuball(req.http.Cookie, “wordpress_test_cookie=[^;]+(; )?”, “”);
# Static content unique to the theme can be cached (so no user uploaded images)
# The reason I don’t take the wp-content/uploads is because of cache size on bigger blogs
# that would fill up with all those files getting pushed into cache
if (req.url ~ “wp-content/themes/” && req.url ~ “\.(css|js|png|gif|jp(e)?g)”) {
unset req.http.cookie;
}
# Even if no cookies are present, I don’t want my “uploads” to be cached due to their potential size
if (req.url ~ “/wp-content/uploads/”) {
return (pass);
}
# Check the cookies for wordpress-specific items
if (req.http.Cookie ~ “wordpress_” || req.http.Cookie ~ “comment_”) {
# A wordpress specific cookie has been set
return (pass);
}
# allow PURGE from localhost
if (req.request == “PURGE”) {
if (!client.ip ~ purge) {
error 405 “Not allowed.”;
}
return (lookup);
}
sub vcl_fetch {
#set obj.grace = 5m;
set beresp.grace = 2m;
}
sub vcl_hit {
if (req.request == “PURGE”) {
purge;
error 200 “Purged.”;
}
}
sub vcl_miss {
if (req.request == “PURGE”) {
purge;
error 200 “Purged.”;
}
}
# Drop any cookies sent to WordPress.
sub vcl_recv {
if (!(req.url ~ “wp-(login|admin)”)) {
unset req.http.cookie;
}
}
# Drop any cookies WordPress tries to send back to the client.
sub vcl_fetch {
if (!(req.url ~ “wp-(login|admin)”)) {
unset beresp.http.set-cookie;
}
}
Hi JUSTIN, thanks for nice article i want to use Varnish as a reverse proxy and need to enforce business logic per device, since device could have multiple user-agent how can i fingerprint based on device level not browser fingerprint? thanks in advance