How to use Nginx as a reverse proxy


Nginx can be used as a web server, traffic server, or reverse proxy server. Large infrastructures with a need to utilize internal caching mechanisms often use Nginx in the reverse proxy mode. In this article, we are outlining the process of how to use Nginx as a reverse proxy server.

Pre-requirement: Make sure you have the nginx installed.

Specification:

OS Version: Ubuntu Os 18.04.3 (LTS) x64
Nginx version: nginx/1.14.0

Typically, there are only two directives in the Nginx that can turn regular traffic or Web server into the reverse proxy server. One directive is “proxy_pass” which directs the requests back to the origin backend server. The other is “proxy_cache”, a directive that defines the physical location where the static files will be cached. Of course, even though this will turn Nginx into a reverse proxy, there is much more to it so, let’s check it out!

Below is the Nginx vhost file (configuration file) that we’ll edit in order to get it working as a reverse proxy:

server {
	listen 80 default_server;
	listen [::]:80 default_server;
root /var/www/html;
	index index.html index.htm index.nginx-debian.html;
 
	server_name _;
 
	location / {
		try_files $uri $uri/ =404;
	}
}

Before we continue let’s define what will be our backend server. What server will be responding to us through this Nginx instance. For the purpose of the exercise that can be any publicly accessible server so, let’s go with http://bluegrid.io as a backend. 

At the moment, Nginx has no custom settings in the vhost file so it returns default response that we can see with the following command:

bluegrid-edu:~# curl -I localhost
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sun, 26 Jul 2020 11:12:24 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Sat, 25 Jul 2020 20:25:23 GMT
Connection: keep-alive
ETag: "5f1c9533-264"
Accept-Ranges: bytes

And this is the response we get from http://bluegrid.io:

bluegrid-edu:~# curl -I http://bluegrid.io
HTTP/1.1 200 OK
Date: Sun, 26 Jul 2020 11:13:13 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/5.6.38
Last-Modified: Sun, 26 Jul 2020 11:04:46 GMT
ETag: "d7b7-5ab562c93193f"
Accept-Ranges: bytes
Content-Length: 55223
Vary: Accept-Encoding
Referrer-Policy: no-referrer-when-downgrade
Content-Type: text/html; charset=UTF-8

What we expect is to see the http://bluegrid.io response showing when we look at http://bluegrid.io

  • To begin with it we should create a physical location where the static assets will be cached:
bluegrid-edu:~# mkdir /var/cache/nginx
  • Then let’s tell the Nginx where is this location:
proxy_cache_path /var/cache/nginx keys_zone=my_zone:10m inactive=10h;
  • Remove the “try_files” directive because we want the Nginx to pass the requests back to backend server:
#try_files $uri $uri/ =404;
  • Define the backend server:
proxy_pass http://bluegrid.io;
  • Define the status codes we want to cache:
proxy_cache_valid 200 1d;
  • Load caching zone into thee vhost:
proxy_cache my_zone;
  • Add a response header that will tell whether the response is returned from the cache or from the backend server. Example: Cache: HIT if cached or Cache: MISS if not cached:
add_header Cache $upstream_cache_status;
  • Define a number of requests before Nginx will try to cache it:
proxy_cache_min_uses 2;
  • Define a cache key. The cache key is a list of arguments based on which the Nginx will differentiate when deciding whether the request is new or not. If something should be cached or served previously cached request:
proxy_cache_key $http_host$uri$args$is_args;
  • Restart nginx service:
bluegrid-edu:~# systemctl restart nginx

We now have the reverse proxy up and running and we can check out the entire vhost file with above directives added into it:

proxy_cache_path /var/cache/nginx keys_zone=my_zone:10m inactive=10h;
server {
	listen 80 default_server;
	listen [::]:80 default_server;
root /var/www/html;
	index index.html index.htm index.nginx-debian.html;
 
	server_name _;
	proxy_cache_key $scheme$http_host$uri$is_args$args;
	location / {
        proxy_cache my_zone;
        proxy_pass http://bluegrid.io;
        proxy_cache_valid 200 1d;
        add_header Cache $upstream_cache_status;
        proxy_cache_min_uses 2;
	}
}

Of course, we can confirm reverse proxy is working by testing the HTTP response from http://bluegrid.io. We’ll see the response from http://bluegrid.io:

  • First and second request (Expected to see Cache: MISS because it’s the first request):
bluegrid-edu:~# curl -I http://bluegrid.io/
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sun, 26 Jul 2020 13:24:05 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 55254
Connection: keep-alive
Last-Modified: Sun, 26 Jul 2020 13:11:15 GMT
ETag: "d7d6-5ab57f0e9e35f"
Vary: Accept-Encoding
Referrer-Policy: no-referrer-when-downgrade
Cache: MISS
Accept-Ranges: bytes
 
bluegrid-edu:~# curl -I http://bluegrid.io/
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sun, 26 Jul 2020 13:24:06 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 55254
Connection: keep-alive
Last-Modified: Sun, 26 Jul 2020 13:11:15 GMT
ETag: "d7d6-5ab57f0e9e35f"
Vary: Accept-Encoding
Referrer-Policy: no-referrer-when-downgrade
Cache: MISS
Accept-Ranges: bytes
  • Third request (expecting to see Cache: HIT because proxy_cache_min_uses was set to 2. It means on second request Nginx will cache the request and serve it from the cache):
oot@bluegrid-edu:~# curl -I http://bluegrid.io/
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sun, 26 Jul 2020 13:24:07 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 55254
Connection: keep-alive
Last-Modified: Sun, 26 Jul 2020 13:11:15 GMT
ETag: "d7d6-5ab57f0e9e35f"
Vary: Accept-Encoding
Referrer-Policy: no-referrer-when-downgrade
Cache: HIT
Accept-Ranges: bytes

What is the cache key?

Nginx uses directive “proxy_cache_key” to define what parts of the request that come to the Nginx server will be used to identify the request. Then decide whether it’s a new request (something that should yet to be cached) or it’s already been made before so the response to it should be served from the cache (if already cached, of course).

Anatomy of request

Every request sent to a server consists of URL it targets, request headers and body. For cache key to understand we need to break down the URL and it can be broken down to the following segments:

  1. Protocol (ex: http, without “://”)
  2. Domain/Hostname (ex:bluegrid.io)
  3. URI (ex:/about-us/)
  4. Query string (ex:?v=1)

Depending on the use case we can choose to cache responses based on some or all of these parameters. There are pros and cons for every choice so, it is good to know what you need before the cache key is in production.

Let’s examine the cache key in our example:

proxy_cache_key $scheme$http_host$uri$is_args$args;
  1. Proxy_cache_key is a directive that defined the attributes of the key
  2. $scheme is the local variable storing the protocol request was made by (ex: HTTP or HTTPS)
  3. $http_host is the variable storing the domain request was made for (ex: bluegrid.io)
  4. $uri is the variable storing the URI value (ex: /about-us/)
  5. $is_args is the variable storing “?” or “” (NULL) depending on whether the request contains query string or not (“?” for yes and “” for no)
  6. $args is the variable containing actual query string (ex: a=1&b=2&c=3)

Now that we know what the cache key is made of we can easily decide how to use it and format it. For example, you probably don’t want to cache responses with query string if you expect a large number of requests with query string but with the same content returned (marketing campaigns) because that way your cache storage will be used up faster. Then you might want to use cache key formed like this:

proxy_cache_key $http_host$uri;

This means that no matter if there is a query string or not, the request for the targetted URL will always be treated the same. If we kept “$args” and “$is_args” in the cache key then Nginx would be treating every query string change for the same URL differently and cache it separately.

Share this post

Share this link via

Or copy link