Rant — Cookies

See also my earlier rant on images, animations, plug-ins, Flash, Javascript, and frames.

Cookies

The biggest problem with cookies is that they are abused by the advertising giants of the Internet.  The original purpose of cookies — and still their best use — is to turn HTTP from a "stateless" protocol into one with some stateful behaviour.

Cookies have several properties, but one of the most important is their "expiry" date.  In this respect, cookies fall into two categories: "session" cookies (ones which are discarded when you quit your web browser), and "stored" cookies (ones that are stored on your hard disk, and will be used again and again until some time in the future).

About Cookies In General

Here's what cookies are for, in a nutshell: suppose you're using some web site which involves the use of a "shopping basket" facility.  You can browse the site for products, and when you find one you want, you click the "add to basket" icon.  So far, so good.  Now at some point, you go to the page which shows you what's in your basket.  How does it know what to show?

Most pages on the web are "static".  This means that if two people on two different computers go and look at that page, they will see exactly the same page, word for word.  But what about our shopping basket page?  It's got to show a different thing to each user — specifically, it has to know which shopping basket to show.  Not only that, but if I look at the basket page, and then return a few minutes later (having added more items to my shopping basket), it's got to show me something different again!  So how does it do it?

The answer is (usually) a cookie.  Typically when you first visit a web site, your web browser won't send a cookie (because it doesn't have any yet).  The web server, seeing that you don't yet have a cookie, concludes that you've just arrived at the site, and starts a "session" for you.  That session will have a unique, usually cryptic ID (for example, 038307803-86-3057687), and it's that ID which is then sent to your computer in the form of a cookie.  Now every time you visit a page on that web site, the web server will know that you're using "session ID 038307803-86-3057687", and so it can remember certain information with that session — for example, the contents of your shopping basket.

So How Can Cookies Be Bad?

Okay, so session cookies — ones that disappear when you turn off your computer or close your web browser — are basically good, useful, and harmless.  What about other types of cookies?

The other type of cookie is the "stored" cookie.  What happens is that when you visit some web page, the web server sends your computer a cookie, and that cookie contains an expiry date.  If it doesn't have an expiry date, it's a session cookie — see above.  Otherwise, that cookie will be saved to your hard disk, and returned again and again to the web server, every time you visit their web site, day after day, week after week.  Until one day, the cookie's expiry date arrives, and at that point the cookie is deleted.

The expiry date can actually be any length of time in the future (at least, up until about the year 2037) — it might be just a minute or two, a week, a month, a year, or many years in the future.  Many web servers would like to set cookies that never expire, so in practice they often use an expiry date around the year 2037.

Third-Party Cookies and Advertising

So far I've talked about how a web site can send you a cookie, then you send it back.  There is another way cookies can be used — or, should I say, abused — however: "third party" cookies.

What happens here is that you visit web site A, and they send you a cookie for web site B.  Now (assuming you don't have a faulty web browser) your computer will never return that cookie to site A, but it will return it to site B.  This in and of itself should probably be a concern since one web site should not be able to interact with another site's cookies.

Now here's where it gets really interesting: it's not just web pages that can send and be sent cookies, but everything delivered over HTTP, including images, scripts, and animations.  What happens is this:

  1. You visit some web site which includes some advertising, for example The New York Times.
  2. One of the images on that page doesn't come from the web site you're visiting at all; instead, it comes from an advertising company's web server; for example, doubleclick.
  3. Your web browser sends an HTTP request for the image to the advertising company's web server.  That request (usually) includes "referrer" information, so the ad company knows which web site you're actually visiting.
  4. Unless you've already got one of their cookies, the advertiser's web server sends you a cookie (set to expire a long time in the future).  The cookie's domain is that of the ad company's web server.

Now, whenever you visit any web site which includes advertising from that company (e.g. any site which includes DoubleClick's adverts — and that's a lot of sites), then the ad company knows all sorts of things, including:

In doing this, the advertisers build up a history of the web sites you've visited — or at least, the ones on which they advertise.  They don't necessarily know who you are (in terms of your name, age, gender, address etc), but even that information can come their way eventually, say for example if you provide that information to a web site with a rather suspect privacy policy.  In this way, cookies can be used to monitor your web browsing habits, much as store loyalty cards are used to monitor your shopping habits (especially ones that are shared between many participating stores).

Web Browser Control

So how much control does your web browser give you over cookies?  In the document which defines the use of cookies, the authors say that "Privacy considerations dictate that the user have considerable control over cookie management". They explain (this is quite a big quote, but worth reading):

[The cookie] specification … requires that a user agent give the user control … the control mechanisms provided shall at least allow the user:

Such control could be provided by, for example, mechanisms

A user agent usually begins execution with no remembered state information. It should be possible to configure a user agent never to send Cookie headers, in which case it can never sustain state with an origin server. (The user agent would then behave like one that is unaware of how to handle Set-Cookie response headers.)

When the user agent terminates execution, it should let the user discard all state information. Alternatively, the user agent may ask the user whether state information should be retained; the default should be "no". If the user chooses to retain state information, it would be restored the next time the user agent runs.

Microsoft Internet Explorer, prior to version 6, was woefully inadequate in this area.  Through the use of "zones", cookies could be globally enabled, globally disabled, or the program could be set to ask you each time you got a cookie (the default setting).  Although it was very easy to find the option to enable all cookies (i.e. without asking each time), the option to disable them (without asking) was much harder to find.  The other controls mentioned in the above document (for example, the ability to inspect cookies at any time, or the ability to control the saving of a cookie based on its domain) were missing entirely.

Internet Explorer 6 might be better in this area — I haven't tried it yet.  However most other browsers fare much better in this arena — for example, Netscape 6 aka Mozilla sports good cookie control, as does Opera and (if I remember correctly) Konqueror.

Cookies and Javascript

One of the most important aspects of cookies is that they can only be sent to web servers in the correct "domain" — for example, when you visit web site A, it can't access any cookies you've got for web site B.  This is important, because if (for example) your "shopping basket" cookie is made available to just any old web site (other than the one where you're doing the shopping), potentially malicious types could gain access to your data — for example, they could see what's in your shopping basket, or (if you're using a "web mail" service such as hotmail), they could read your mail.

So finally it's worth noting that many times bugs have been discovered in web browsers which allow one web site to access another web site's cookies, by using Javascript.  I suppose this is really an argument primarily against bad web browsers (Internet Explorer is the worst offender in this area), and in a way against Javascript (because if you can't trust your web browser, you should certainly disable Javascript).