Varnish 中两个重要的子程序
作者: 曲文庆 日期: 2012-10-26 17:06
目前varnish已经到了3.0.3,今天因处理问题,看了一些2.x的资料,感觉还是有一定同性和收获
https://www.varnish-cache.org/trac/wiki/Introduction
HHow VCL works
There are actually 8 subroutines that control how Varnish behaves, and that you can change in your Varnish config. They are:
vcl_recv()
Called after a request is received from the browser, but before it is processed.
vcl_pipe()
Called when a request must be forwarded directly to the backend with minimal handling by Varnish (think HTTP CONNECT)
vcl_hash()
Called to determine the hash key used to look up a request in the cache.
vcl_hit()
Called after a cache lookup when the object requested has been found in the cache.
vcl_miss()
Called after a cache lookup when the object requested was not found in the cache.
vcl_pass()
Called when the request is to be passed to the backend without looking it up in the cache.
vcl_fetch()
Called when the request has been sent to the backend and a response has been received from the backend.
vcl_deliver()
Called before a response object (from the cache or the web server) is sent to the requesting client.
That seems like a lot, but as I mentioned before Varnish has pretty reasonable defaults, so you only need to override a few of these.
ThThe inbound request: vcl_recv
The vcl_recv subroutine handles the incoming request from the client. This is where you set basic config options and do any request setup and tweaking before varnish looks up the item or passes the request on to the backend. You can see the complete routine in the config file. Here, we will break it down and explore each piece of functionality one at a time.
The incoming request is available within vcl_recv as 'req', you can review the Varnish docs to see all the properties that are available. We'll cover the ones we use as we encounter them.
set req.http.host = "mycatalystsite.com";
The req.http variable contains the headers supplied with the request. This sets the host header to pass on to the backend. This is not strictly necessary and can cause trouble if you are fronting several sites with a single varnish installation. However, without this header, requests to mycatalystsite.com and www.mycatalystsite.com would be treated as completely separate by Varnish, potentially doubling every entry in your cache. If you are fronting only a single site, normalizing the hostname is a good idea.
set req.backend = catalystsite;
As you can probably guess, this simply tells Varnish which backend should process this request. If you had multiple backends, this is where you would do your backend selection.
if (req.http.Cache-Control ~ "no-cache") { purge_url(req.url); }
What this block does is tell Varnish to purge the cache for a particular URL when the no-cache cache-control header is sent from the client. That header is provided whenever you hold shift while pressing reload in the Firefox browser. In other words, this forces Varnish to refresh the cache for the page you are looking at when you hit shift-reload. Note that this will happen if anyone does a shift-reload, so you may not want this in a production environment, but it is definitely useful in debugging your initial deployment.
if (req.request == "GET" && req.url ~ "^/static/") { unset req.http.cookie; unset req.http.Authorization; lookup; }
This tells Varnish if the request is for anything that starts with /static that it should remove any cookie header or authorization information, and then look up the request in the cache. The lookup line is important. This tells Varnish that it should stop executing the vcl_recv routine and look up the item in the cache. This is an example of a keyword.
Keywords in Varnish can be though of as a 'return' for the subroutine combined with the value it is returning. If you do not use a keyword somewhere to terminate your subroutine, control will fall through to the default Varnish subs. There is a knack to figuring out when to return and when to fall through, and you will get the hang of it after working with Varnish for a short while. In the meantime, you can rely on the fact that the config presented here does the right thing. You can read up on the default behavior in the Varnish docs and wiki also. For now, let's move on to the next part of our vcl file.
if (req.request == "POST") { pass; }
This tells Varnish that if the request method is POST, that it should NOT look it up in the cache and should instead pass it directly to the backend. This is most likely what you want, as POST data is generally form submission and the result will vary from user to user and request to request.
This snippit also introduces our next keyword, pass. Pass tells Varnish that it should pass the request through to the backend without looking it up in the cache. This is a subtle but critical detail because even if you put something into the cache in vcl_fetch, if you pass when receiving a request for that item, it will never actually be served from the cache.
if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") { # Non-RFC2616 or CONNECT which is weird. # pass; }
Again we have an exclusion rule. This says that if the request method isn't one of the 'normal' web site and web service methods, Varnish should send it right through to the backend.
if (req.http.Authorization) { # Not cacheable by default # pass; }
And our last check. If the client is providing an Authorization header that indicates some sort of access control is in place and we want to pass it directly through to the backend. Chances are, if an Authorization header is being provided, the data coming back is going to be tailored to the user in question, so we don't bother looking it up.
Finally... our last line of vcl_recv:
lookup;
If we haven't explicitly handled it already somewhere along the way, we look it up in the cache. Note that since our vcl_recv ends with a keyword, Varnish's builtin vcl_recv never gets a chance to execute. That's OK in this case, because we have handled the different scenarios that we are interested in.
HaHandling the web-server response: vcl_fetch
When Varnish enters the vcl_fetch subroutine, it has already requested data from the backend web server and has received a response. It has not, however, inserted anything into the cache yet. In fact, most of what you are likely to be doing in vcl_fetch is determining whether the response should be cached or not.
In many ways the rules you place in vcl_fetch will directly correlate to the rules you placed in vcl_recv earlier. The difference is that in vcl_recv you were deciding whether to look up the item in the cache, whereas in vcl_fetch you are deciding whether you should insert the response into the cache in the first place.
The difference between the lookup check and the insert check is a subtle one, as cache control can really be done in either place. It is, however, important to use vcl_fetch effectively, because limiting what goes into the cache is the only real control you have over the the size of your cache. It's also all too easy to accidentally let a request pass into lookup that you didn't want to. The best way to avoid serving up invalid cached data is to make sure it's never placed in the cache in the first place. Ultimately, it's best to use vcl_recv and vcl_fetch in tandem to make sure that what goes into and comes out of the cache is exactly what you want.
All that said, let's begin exploring our vcl_fetch routine.
if (req.request == "GET" && req.url ~ "^/static" ) { unset obj.http.Set-Cookie; set obj.ttl = 30m; deliver; }
We start out with a very important piece of our cache control. This block tells Varnish that if the request is GETing something from /static/* it should be placed in the cache. The line that begins with unset ensures that if, for some reason, a cookie is being set by the server, the cookie header is removed before it is placed in the cache. We also set obj.ttl to 30 minutes, forcing static content to expire in 30 minutes regardless of what we get from the server. This makes sense as we may not have direct control over the cache control headers on static files.
Notice that we are working with two objects now, the req object, which is the original request, and the 'obj' object, which is the response received from the backend server. We also encounter our first vcl_fetch keyword here, deliver. When deliver is encountered it tells Varnish that the object it currently is working with should be inserted into the cache.
if (obj.status >= 300) { pass; }
This is another important piece of cache control. This tells Varnish that if the object it got from the web server has a status that is not in the 200s, that it should not cache it. This is important because if you have an error or other exceptional condition on the server, you do not want to be serving that error over and over to your site visitors. Sometimes 30x responses, such as redirects, could be cached in order to relieve the backend server of work, but in this case, we play it safe and assume that if it's not an 'OK' response it's not cacheable. Notice we have re-encountered the 'pass' keyword. In vcl_fetch, pass tells Varnish to send the response to the client, but don't save the response in the cache.
if (!obj.cacheable) { pass; }
Glancing at this block it's pretty obvious what is going on. The cacheable property on the response is set by Varnish, and it is basically Varnish's opinion as to whether the object could be cached or not. Varnish is pretty smart about this, so if it thinks the object is not cacheable, we should trust its judgement and avoid placing it in the cache. Note that we don't automatically assume that if Varnish thinks it is cacheable that we should cache it. We make that decision ourselves.
if (obj.http.Set-Cookie) { pass; }
By now you are probably getting the hang of it. If the response is setting a cookie, it's a safe bet that that cookie is intended for someone in particular. We should return the request to the client, but don't cache it.
if(obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private") { pass; }
Now we are getting into the meat of our cache control. This is the first point where what happens on the server directly controls what happens within Varnish. This block says that if any of the cache-control headers are set to 'no cache', obey them. This allows you, within your web app, to directly tell Varnish Do NOT cache this. Without this block Varnish would attempt to guess whether the response was cacheable, and it would likely be wrong as Varnish does not obey no-cache instructions in its default configuration.
if (obj.http.Cache-Control ~ "max-age") { unset obj.http.Set-Cookie; deliver; } pass;
And here is perhaps the most important rule in our cache control policy and where we depart significantly from Varnish's default behavior. This looks at the Cache-Control header in the response object to see whether the string 'max-age' appears within it. If it does, then it clears out any cookie related headers and places the response object in the cache. If it doesn't, we tell Varnish not to cache it, no matter what it thinks about the response's cacheability.
As we touched on before, anything containing any cookie related headers is immediately deemed uncacheable by Varnish by default. Varnish's behavior is exactly right in most cases. What we are doing with this block is explicitly saying 'Trust the backend server, if it says it's cacheable, cache it.' Since we are running the backend server and in an app like Catalyst we have complete and reliable control over the headers provided, this is relatively safe to do.
By default, Varnish also interprets the Expires header to determine cacheability. We eschew Varnish's logic here and say 'if we didn't set max-age on the web-server, then you are not to cache it.' This gives us one clear method for controlling the cacheability of data in our application... and ensures that if we don't explicitly set the 'max-age' property in our app, the item will not be cached. A quick side note: Is it a hit?
When you are first working with the cache in place, you will at some point want to know if a piece of content you are looking at came from the cache or from the backend server. Yes, you could go to the backend server, make the request and watch the access logs, but there is an easier way.
If you look at the headers returned on the item in question, you will see a header called 'X-Varnish.' That header will contain either one or two numbers (separated by a space.) If the 'X-Varnish' header contains two numbers, the data came from the cache. If it contains only one, it did not come out of the cache. Knowing this piece of information can make your debugging much simpler.
Basically what we have done with this configuration is require an explicit cache 'allow' and what it means is that nothing in our application will be cached until we say it should be cached. This is, in my opinion, the only safe way to operate a cache in front of a web application.
In anything but the smallest web apps, there are always pieces that you don't think of or that interact in a way you don't recognize until you are presented with a trouble ticket. The best thing to do in my experience is to turn caching on as you work through specific paths in your web app. The configuration presented here allows you to do that and be sure that only those pieces you have explored are cached.
