Saturday, August 25, 2007

Wrong 'Content-Type' from web servers - an example from the OASIS 'docs' site

pencil icon, that"s clickable to start editing the post

It's often the little things that's the most annoying. For me one of those are documents sent with the wrong Content-Type from web servers. I just encountered it once more on the OASIS docs website, and since so many OASIS standards use xsd and wsdl files this shouldn't happen to one of they're sites.

I wanted to look at the XML Schema for WSRP V2.0 type and it clearly showed up as 'plain/text' in my Firefox browser. Here's the headers (my bold):

wget -S
           => `wsrp-2.0-types.xsd'
Connecting to||:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Sat, 25 Aug 2007 12:05:20 GMT
  Server: Apache/2.2.3 (Debian) mod_python/3.2.10 Python/2.4.4 mod_ssl/2.2.3 OpenSSL/0.9.8c mod_perl/2.0.2 Perl/v5.8.8
  Last-Modified: Tue, 10 Jul 2007 00:13:00 GMT
  ETag: "a5000f-10875-434dd9fc8db00"
  Accept-Ranges: bytes
  Content-Length: 67701
  Keep-Alive: timeout=10, max=200
  Connection: Keep-Alive
  Content-Type: text/plain
Length: 67,701 (66K) [text/plain]

100%[====================================>] 67,701        55.75K/s

13:57:27 (55.60 KB/s) - `wsrp-2.0-types.xsd' saved [67701/67701]

The web server is an Apache of a recent version (2.2.3), so either this hasn't been added the magic files or i has been deliberately changed by the IT-folks running the web server.

I dowloaded the most recent tarball (version 2.2.4) . In the file docs/conf/mime.types used by mod_mime the most common file extensions are listed. To my surprise xsd isn't there, and the same goes for wsdl. It's just the most basic ones can be found:

application/xhtml+xml           xhtml xht
application/xslt+xml            xslt
application/xml                 xml xsl

Uhmm nothing here, but what about docs/conf/magic used by mod_mime_magic. Here's a hint for detecting XML-files in general:

# XML eXtensible Markup Language, from Linus Walleij <>
0   string      \<?xml      text/xml

and the xsd for WSRP 2.0 types does have the XML-declaration so this should have caught it. For a second I started to wonder if there's a specific Content-Type (mime-type) for XML Schema, but a search gave me nothing.

I conclude that the Content-Type for XML Schema files, should be text/xml and certainly not text/plain. It looks like it's deliberate, for some unknown reason to me. The documentation for mod_mime_magic has a note about performance This module is not for every system. If your system is barely keeping up with its load or if you're performing a web server benchmark, you may not want to enable this because the processing is not free.. But OASIS should for one have added both entries for xsd and wsdl in mime.types and have a scalable web-park without the need for this kind of performance hacks. Also the content is relatively controlled to it's not like they needed to support all kinds of file extensions.

Had i been using IE i would have been presented as 'text/xml' since IE almost ignores the Content-Type and only uses an internal algorithm which also uses the filename suffix on documents, in this case 'xsd'. This behavior isn't right in my eyes and the problems arise when ex. XML-documents are located at URL's that doesn't end with '.xml'.