Inspecting HTTP Cache-Control
I was following the ACloudGuru tutorial to setup a web server with httpd on a EC2 instance and a AWS Cognito Identity Pool.
I wanted to update the index.html
file but on the browser I still got the old file. I blamed httpd not using the most recent html file and I restarted the service several times. However, running curl localhost
on the web server machine, returned the most updated page. It meant that httpd was loading the most updated page.
I refreshed the page many time making sure to press the SHIFT key to prevent cached result, but yet I got the old page.
Is it due to AWS that is keeping a cache of my requests preventing me from retrieving the new page? It is akward, because I have never encountered such a behaviour. Moreover I never heard of a cache invalidation for a EC2 instance or a VPC network.
Is it due to a mismatch between timezones since the EC2 instance is hosted in North Virginia? I changed the timezone with sudo timedatectl set-timezone UTC
but nothing changes.
Inspecting HTTP requests
Without getting any result, I decided to inspect the HTTP requests and to send custom requests using the Firefox Developer Tools.
The current state of /var/www/html/index.html
is (timezone is UTC):
ls -l
total 584
-rw-r--r-- 1 root root 590557 Jun 19 10:54 fact.jpg
-rw-r--r-- 1 root root 1533 Jun 19 12:30 index.html
I open a new private tab in Firefox and I enter http://<ip address of ec2 instance>
. The request is set at 12:50 GMT:
GET / HTTP/1.1
Host: 18.208.106.244
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
The response is:
HTTP/1.1 200 OK
Date: Sun, 19 Jun 2022 12:50:11 GMT
Server: Apache/2.4.53 ()
Last-Modified: Sun, 19 Jun 2022 12:30:51 GMT
ETag: "5fd-5e1cc286fa77b"
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
Content-Length: 879
Connection: keep-alive
The Date
parameter is populated with the request date and time.
The Last-Modified
parameter has the last change date of index.html
.
Now, I update the /var/www/html/index.html
file, so the current state of the folder is:
ls -l
total 584
-rw-r--r-- 1 root root 590557 Jun 19 10:54 fact.jpg
-rw-r--r-- 1 root root 1533 Jun 19 12:55 index.html
I open a private tab in Firefox and request the web server ip address again, but this time I hold the SHIFT key before hitting the refresh button on the browser:
The request is sent at 12:57 GMT:
GET / HTTP/1.1
Host: 18.208.106.244
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
and the response is:
HTTP/1.1 200 OK
Date: Sun, 19 Jun 2022 12:50:11 GMT
Server: Apache/2.4.53 ()
Last-Modified: Sun, 19 Jun 2022 12:30:51 GMT
ETag: "5fd-5e1cc286fa77b"
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
Content-Length: 879
Connection: keep-alive
Note that Date
and Last-Modified
didn’t change from the previous request, even though the Cache-Control
header is no-cache
.
Now I edit the request headers to for the Cache-Control: max-age=0
(13:03 GMT):
GET / HTTP/1.1
Host: 18.208.106.244
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0, no-cache
Pragma: no-cache
But Firefox still adds the no-cache
and the Pragma: no-cache
header.
However, the response is:
HTTP/1.1 200 OK
Date: Sun, 19 Jun 2022 13:03:02 GMT
Server: Apache/2.4.53 ()
Last-Modified: Sun, 19 Jun 2022 12:55:35 GMT
ETag: "5fd-5e1cc80debb24"
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
Content-Length: 878
Connection: keep-alive
Forcing Cache-Control: max-age=0
returns the updated files because the Last-Modified
header is the same as the modification date of index.html
on Linux.
But how does Cache-Control
work?
no-cache
According to MDN (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#no-cache_2) for no-cache
:
The
no-cache
request directive asks caches to validate the response with the origin server before reuse.Cache-Control: no-cache
no-cache
allows clients to request the most up-to-date response even if the cache has a fresh response.Browsers usually add
no-cache
to requests when users are force reloading a page.
max-age
According to MDN (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#max-age_2) for max-age
:
The
max-age=N
request directive indicates that the client allows a stored response that is generated on the origin server within N seconds — where N may be any non-negative integer (including0
).Cache-Control: max-age=3600
In the case above, if the response with
Cache-Control: max-age=604800
was generated more than 3 hours ago (calculated frommax-age
and theAge
header), the cache couldn’t reuse that response.Many browsers use this directive for reloading, as explained below.
Cache-Control: max-age=0
max-age=0
is a workaround forno-cache
, because many old (HTTP/1.0) cache implementations don’t supportno-cache
. Recently browsers are still usingmax-age=0
in “reloading” — for backward compatibility — and alternatively usingno-cache
to cause a “force reloading”.If the
max-age
value isn’t non-negative (for example,-1
) or isn’t an integer (for example,3599.99
), then the caching behavior is undefined. However, the Calculating Freshness Lifetime section of the HTTP specification states:Caches are encouraged to consider responses that have invalid freshness information to be stale.
In other words, for any
max-age
value that isn’t an integer or isn’t non-negative, the caching behavior that’s encouraged is to treat the value as if it were0
.
Conclusion
It seems that httpd returns the cached result when Cache-Control: no-cache
. To invalidate cache instead, Cache-Control: max-age=0
is needed.
Leave a comment