A few days ago I received a bug report from a FUDforum user about his forum members having trouble staying logged in when using the AOL 9 browser, herein to be referred to as
. According to him after logging in and browsing a few pages, the user would suddenly find themselves being logged out from the FUDforum. This happened on seemingly random pages, with no common element in between making tracking down the problem ever so enjoyable. The following is a nightmarish tale of me trying to resolve this problem, which hopefully serve as a clue to the developers who encounter the same issue.
At first I thought the issue maybe related to the fact that POS, a hacked up IE that always goes through AOL proxies when it comes to fetching the content. These proxies change in between requests (load balancer?) so during the same session a user may go through any number of different IP addresses, which AOL has a fair number of.
64.12.96.0/19, 149.174.160.0/20
152.163.240.0/21,152.163.248.0/22
152.163.252.0/23,152.163.96.0/22
152.163.100.0/23,195.93.32.0/22
195.93.48.0/22,195.93.64.0/19
195.93.96.0/19,195.93.16.0/20
198.81.0.0/22,198.81.16.0/20
198.81.8.0/23,202.67.64.128/25
205.188.192.0/20,205.188.208.0/23
205.188.112.0/20,205.188.146.144/30
207.200.112.0/21
The IPs of the proxies change in a totally random order and the proxies are themselves anonymous, effectively hiding the IP address of the user from the web server. As you can imagine this makes IP based validation for AOL users who use POS impossible. This caused a problem for older versions of FUD where IP validation (toggleable option) would not account for AOL peculiarities. A fix in earlier version was to simply not check the IP data for AOL users, which seemed to have solved the problem adequately until AOL 9.0 with new and exciting features intended to make web developer’s lives even more difficult came out.
After getting the user to upgrade to the new FUDforum and having them still being able to replicate the issue, I was left will little choice then to install AOL MAX (uses existing broadband connection) on my test win32 box to try to resolve the problem. This turned into a bit of an adventure as the box rebelled against AOL install and promptly trashed the drive, so the next few hours involved getting a new drive and re-installing Win XP. Interestingly enough installing AOL MAX from a CD I picked up at the store took nearly as long as it did to install Windows XP (not including security patches installs). You really have to wonder just what in the hell does AOL put on there that requires nearly an hour to install on a dual 533mhz Celeron with 512 megs of ram and a new 7200k rpm drive. Oh well, the things I do for FUDforum QA…
With AOL finally running, I’ve began trying to replicate the problem which is where the “fun” began. One of the first things I’ve done was modify the forum to log the entirety of request headers provided by AOL, so that I could see the exact nature of each request. My initial suspicion was that proxies mangle browser identification in some way causing the session to become invalidated. Unfortunately, I had been running PHP 5.1 with the latest pdo_mysql driver, which Andrey had recently modified causing buffered queries to crash. This caused an annoying distraction, taking another hour to resolve and fix, we really need some tests for buffered queries in the PDO MySQL driver.
The side affect of this problem was that the login page of the forum on the 1st request from AOL generated a blank page, which was promptly cached. This meant that anytime I would now access the page unless I explicitly refreshed it by hand it should show up as blank. Clearing AOL browser’s cache, which POS referrers to as “footprints” did absolutely nothing; neither did repeated restarts of the browser and even the entire computer. This meant that the cache was located on the proxy side, ok not a problem. I’ve added a quick hack involving:
PHP:
<?php
header("Expires: Mon, 21 Jan 1980 06:01:01 GMT");
header("Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0");
header("Pragma: no-cache");
?>
This is a generally accepted way of telling browsers and proxies to stop caching the page. To my greatest surprise it did absolutely nothing to alleviate the problem. At the same time looking @ my request log on the server, I could clearly see the AOL proxy request the page each time. But somehow the data was not getting through to POS and “dieing” in between.
To get a better understanding of the situation I’ve decided to analyze the data being transferred between the POS and AOL proxy, by using Ethereal to capture the communication. After making a few requests I turned to my Ethereal log, only to find that it didn’t contain a single HTTP connection. Instead it seems POS talks with the proxy, whose name appears to be “AOL TurboWeb Cache” over a proprietary partially compressed protocol with binary headers. Fortunately the initial component of the request and response is more or less readable so, some data I was able to gather from it. First it seems AOL added some custom response code HTTP protocol, namely 236 that seems to correspond with page being cached, leading to a 302 redirect, presumably a cached content placeholder. It also uses digest authentication found inside the X-AOL-Auth header to presumably confirm that the user has access to the AOL cache and a SessionID to keep track of the user. The proxy request also contains a very interesting header “X-z”, containing something that looks like a winning entry to Perl obfuscation contest, here is a short excerpt:
CODE:
# there a few KB of this garbage
Gkk:hjpeGY9acP<&;KMCGjV1$mB/o+]6K@;S#iZ"KFgQo5JL@/ck7Mc]ZW&lbk&L=YAt^;B\j`tqG
As fun as analyzing the communication between POS and proxy was, it still didn’t solve my caching problem aside from the fact that I knew the pages were being cached (236 http code). I hit Google trying to gather information about why non-cache headers I was sending were being ignored by AOL. My search took me to the
AOL Webmaster FAQ that talks a bit about the nature of AOL proxies, but ultimately was of no help. A bit of further searching took me to another page on the same site http://webmaster.info.aol.com/caching.html, which specifically talks about proxies and AOL. This page have me the clues on how to solve the problem.
First it seems that proxy is not particularly keen on Expires headers, and even if the value is in the past (Jan 1980), its mere presence causes the proxy to sometimes cache the page. If you want to the page’s content to remain uncached you’ll probably are better off not specifying the Expires header all together (AOL specific). Another interesting tidbit was that if the Cache-Control header contains the string “no-cache” which according to RFC means something along the lines of “forces caches to submit the request to the origin server for validation before releasing a cached copy, every time.” and translated by AOL to “This object may be held in any cache but it must be revalidated every time it is requested.”. However, it seems doc writers at AOL are not quite aware of the code, because the reality of the situation is that if you specify “no-cache” proxy will almost always cache the page.
Another clue was the following bit of information:
Cache-Control: private
This object may be held in any cache but it must be revalidated every time it is requested.
This means that unless you explicitly indicate that the page cannot be cached by a proxy, it will be even if all other headers seem to suggest otherwise.
In the end by altering the code a bit I was able to come up with the following header line, which seems very adept and disabling caching for AOL users.
PHP:
<?php
header("Cache-Control: no-store, private, must-revalidate, proxy-revalidate, post-check=0,".
"pre-check=0, max-age=0, s-maxage=0");
?>
Once this header was in place and Pragma and Expires headers removed (for AOL users only) caching problem and the random logout have not surprisingly went away.