Monday, July 16, 2007

A second and deeper look at the Google Analytics Urchin Module

pencil icon, that"s clickable to start editing the post

I first had A quick look at the Google Analytics Urchin Module and then what give cookiewise for a user in Party cookie - GA a virtual first party. In this post I'll go deeper into this by monitoring the HTTP headers (including Cookies) by using ngrep.

For this example I'll use the minimal post The Firefox is a Panda!. First I request the page, giving the following HTTP headers:

T <my.local.ip.number>:44675 -> 72.14.207.121:80 [AP]
GET /2007/03/firefox-is-panda.html HTTP/1.1.
Host: blog.sweetxml.org.
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-DE; rv:1.8.1.2) Gecko/20070312 Firefox/2.0.0.2.
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5.
Accept-Language: da,en;q=0.7,en-us;q=0.3.
Accept-Encoding: gzip,deflate.
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7.
Keep-Alive: 300.
Connection: keep-alive.
If-Modified-Since: Mon, 16 Jul 2007 08:05:35 GMT.
If-None-Match: "30e8cd45-16a1-4d71-9221-725b96401ca5".

And the response, which shows that my browser has a valid copy in cache:

T 72.14.207.121:80 -> <my.local.ip.number>:44675 [AP]
HTTP/1.1 304 Not Modified.
Last-Modified: Mon, 16 Jul 2007 08:05:35 GMT.
Cache-Control: max-age=0 private.
ETag: "30e8cd45-16a1-4d71-9221-725b96401ca5".
Content-Length: 0.
Date: Mon, 16 Jul 2007 21:20:18 GMT.
Server: GFE/1.3.

Then comes a request for a dynamic CSS-file, which for some reason contains all the cookies for both GA and blogger.com, maybe because of the CNAME record.

T <my.local.ip.number>:37235 -> 72.14.207.191:80 [AP]
GET /dyn-css/authorization.css?blogID=591744930960839717 HTTP/1.1.
Host: www2.blogger.com.
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-DE; rv:1.8.1.2) Gecko/20070312 Firefox/2.0.0.2.
Accept: text/css,*/*;q=0.1.
Accept-Language: da,en;q=0.7,en-us;q=0.3.
Accept-Encoding: gzip,deflate.
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7.
Keep-Alive: 300.
Connection: keep-alive.
Referer: http://blog.sweetxml.org/2007/03/firefox-is-panda.html.
Cookie: __utma=150635877.801410967.1184268448.1184496074.1184570589.11; __utmz=150635877.1184570589.11.5.utmccn=(referral)|utmcsr=www2.blogger.com|utmcct=/navbar.g|utmcmd=referral; __utmz=238348806.1184442198.162.32.utmccn=(referral)|utmcsr=blog.sweetxml.org|utmcct=/2007/07/fdim-and-third-party-cookies.html|utmcmd=referral; __utma=238348806.2055646945.1171363119.1184356396.1184442198.162.
If-Modified-Since: Mon, 16 Jul 2007 21:12:21 GMT.

And the response comes gzip'ed with a new cookie for .blogger.com:

T 72.14.207.191:80 -> <my.local.ip.number>:37235 [AP]
HTTP/1.1 200 OK.
Content-Type: text/css; charset=UTF-8.
Cache-Control: max-age=1800 private.
Pragma: no-cache.
Last-Modified: Mon, 16 Jul 2007 21:20:19 GMT.
Transfer-Encoding: chunked.
Set-Cookie: S=blogger=uttzz-o7dCskolJlCyOmeQ; Domain=.blogger.com; Path=/.
Content-Encoding: gzip.
Date: Mon, 16 Jul 2007 21:20:19 GMT.
Server: GFE/1.3.

Next is the request for navbar (classicly in an iframe, but in my template in an object. Once more all the cookies comes along

T <my.local.ip.number>:37235 -> 72.14.207.191:80 [AP]
GET /navbar.g?targetBlogID=591744930960839717&blogName=Sweetxml&publishMode=PUBLISH_MODE_HOSTED&navbarType=SILVER&layoutType=LAYOUTS&homepageUrl=http%3A%2F%2Fblog.sweetxml.org%2Findex.html&searchRoot=http%3A%2F%2Fblog.sweetxml.org%2Fsearch HTTP/1.1.
Host: www2.blogger.com.
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-DE; rv:1.8.1.2) Gecko/20070312 Firefox/2.0.0.2.
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5.
Accept-Language: da,en;q=0.7,en-us;q=0.3.
Accept-Encoding: gzip,deflate.
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7.
Keep-Alive: 300.
Connection: keep-alive.
Referer: http://blog.sweetxml.org/2007/03/firefox-is-panda.html.
Cookie: __utma=150635877.801410967.1184268448.1184496074.1184570589.11; __utmz=150635877.1184570589.11.5.utmccn=(referral)|utmcsr=www2.blogger.com|utmcct=/navbar.g|utmcmd=referral; __utmz=238348806.1184442198.162.32.utmccn=(referral)|utmcsr=blog.sweetxml.org|utmcct=/2007/07/fdim-and-third-party-cookies.html|utmcmd=referral; __utma=238348806.2055646945.1171363119.1184356396.1184442198.162; S=blogger=uttzz-o7dCskolJlCyOmeQ.

The navbar response:

T 72.14.207.191:80 -> <my.local.ip.number>:37235 [AP]
HTTP/1.1 200 OK.
Content-Type: text/html; charset=UTF-8.
Cache-Control: private, no-cache, proxy-revalidate.
Pragma: no-cache.
Transfer-Encoding: chunked.
Content-Encoding: gzip.
Date: Mon, 16 Jul 2007 21:20:22 GMT.
Server: GFE/1.3.

Next an, to me,unknown POST (must be default javascript?)

T <my.local.ip.number>:44675 -> 72.14.207.121:80 [AP]
POST /2007/03/firefox-is-panda.html HTTP/1.1.
Host: blog.sweetxml.org.
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-DE; rv:1.8.1.2) Gecko/20070312 Firefox/2.0.0.2.
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5.
Accept-Language: da,en;q=0.7,en-us;q=0.3.
Accept-Encoding: gzip,deflate.
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7.
Keep-Alive: 300.
Connection: keep-alive.
Content-Type: application/x-www-form-urlencoded.
Referer: http://blog.sweetxml.org/2007/03/firefox-is-panda.html.
Content-Length: 90.
Cookie: __utma=114544340.2000377505.1184620823.1184620823.1184620823.1; __utmb=114544340; __utmc=114544340; __utmz=114544340.1184620823.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none).
Pragma: no-cache.
Cache-Control: no-cache.

Now finally the GA backend call that sync's the cookie values:

T <my.local.ip.number>:44208 -> 64.233.183.104:80 [AP]
GET /__utm.gif?utmwv=1&utmn=2000377505&utmcs=UTF-8&utmsr=1280x1024&utmsc=24-bit&utmul=de-de&utmje=1&utmfl=9.0%20r48&utmcn=1&utmdt=Sweetxml%3A%20The%20Firefox%20is%20a%20Panda!&utmhn=blog.sweetxml.org&utmr=-&utmp=/2007/03/firefox-is-panda.html&utmac=UA-1555577-1&utmcc=__utma%3D114544340.2000377505.1184620823.1184620823.1184620823.1%3B%2B__utmb%3D114544340%3B%2B__utmc%3D114544340%3B%2B__utmz%3D114544340.1184620823.1.1.utmccn%3D(direct)%7Cutmcsr%3D(direct)%7Cutmcmd%3D(none)%3B%2B HTTP/1.1.
Host: www.google-analytics.com.
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-DE; rv:1.8.1.2) Gecko/20070312 Firefox/2.0.0.2.
Accept: image/png,*/*;q=0.5.
Accept-Language: da,en;q=0.7,en-us;q=0.3.
Accept-Encoding: gzip,deflate.
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7.
Keep-Alive: 300.
Connection: keep-alive.
Referer: http://blog.sweetxml.org/2007/03/firefox-is-panda.html.

This is a so called web bug. The parameters are:

  • utmwv=1
  • utmn=2000377505
  • utmcs=UTF-8
  • utmsr=1280x1024
  • utmsc=24-bit
  • utmul=de-de
  • utmje=1
  • utmfl=9.0%20r48
  • utmcn=1
  • utmdt=Sweetxml%3A%20The%20Firefox%20is%20a%20Panda!
  • utmhn=blog.sweetxml.org
  • utmr=-
  • utmp=/2007/03/firefox-is-panda.html
  • utmac=UA-1555577-1
  • utmcc=__utma%3D114544340.2000377505.1184620823.1184620823.1184620823.1%3B%2B__utmb%3D114544340%3B%2B__utmc%3D114544340%3B%2B__utmz%3D114544340.1184620823.1.1.utmccn%3D(direct)%7Cutmcsr%3D(direct)%7Cutmcmd%3D(none)%3B%2B

With the used URL encoding:

  • %2b -> '+'
  • %3a -> ':'
  • %3b -> ';'
  • %3d -> '='
  • %7C -> '|'

So the last parameter can be further divided into:

  • __utma=114544340.2000377505.1184620823.1184620823.1184620823.1;+
  • __utmb=114544340;+
  • __utmc=114544340;+
  • __utmz=114544340.1184620823.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none);+

The GA response (a GIF-image):

T 64.233.183.104:80 -> <my.local.ip.number>:44208 [AP]
HTTP/1.1 200 OK.
Pragma: no-cache.
Cache-Control: private, no-cache, no-cache="Set-Cookie", proxy-revalidate.
Expires: Fri, 04 Aug 1978 12:00:00 GMT.
Content-Type: image/gif.
Last-Modified: Mon, 18 Jun 2007 23:10:08 GMT.
Server: ucfe.
Content-Length: 35.
Date: Mon, 16 Jul 2007 21:20:28 GMT.

The next part came out through ngrep, but I can't figure out why i looks different, and doesn't have a GET and other headers:

T <my.local.ip.number>:44675 -> 72.14.207.121:80 [AP]
action=backlinks&widgetId=Blog1&widgetType=Blog&responseType=js&postID=1676954542643567733

And the response is some javascript

T 72.14.207.121:80 -> <my.local.ip.number>:44675 [AP]
HTTP/1.1 200 OK.
Content-Type: text/javascript; charset=UTF-8.
Cache-control: private.
Transfer-Encoding: chunked.
Content-Encoding: gzip.
Date: Mon, 16 Jul 2007 21:20:28 GMT.
Server: GFE/1.3.

That was it, so the cookies that's set by the JavaScript is sync'ed by means of an GET to an GIF-image with the values in the query string. This has a certain resemblance with the fallback method for the Gallup system with an image request in the <noscript> section.

1 comments :

Spiker Communications said...

Hi, nice reading your post