The OASIS standard for Federated/Single Sign-On SAML V2.0 has a format for exhanging (meta)data about Identity/Service provider. The syntax is defined the XML Schema saml-schema-metadata-2.0.xsd and is futher described (the semantics) in saml-metadata-2.0-os.pdf. In this blogpost I'll focus on the less important use of
xml:lang in this format and look at the general use of it.
Reusing xml:lang in XML Schema
It's the contents of the
Organization that is parametrized with the
This involves importing the namespace in the XML Schema (it'a all ready defined and we just want to use it).
19 <import namespace="http://www.w3.org/XML/1998/namespace" 20 schemaLocation="http://www.w3.org/2001/xml.xsd"/>
This it then used ind the definitions of two elements:
The localizedNameType complex type extends a string-valued element with a standard XML language attribute.
36 <complexType name="localizedNameType"> 37 <simpleContent> 38 <extension base="string"> 39 <attribute ref="xml:lang" use="required"/> 40 </extension> 41 </simpleContent> 42 </complexType>
The localizedURIType complex type extends a URI-valued element with a standard XML language attribute.
43 <complexType name="localizedURIType"> 44 <simpleContent> 45 <extension base="anyURI"> 46 <attribute ref="xml:lang" use="required"/> 47 </extension> 48 </simpleContent> 49 </complexType>
This is then put to use in the definition of
<Organization>element specifies basic information about an organization responsible for a SAML entity or role. The use of this element is always optional. Its content is informative in nature and does not directly map to any core SAML elements or attributes.
120 <element name="Organization" type="md:OrganizationType"/> 121 <complexType name="OrganizationType"> 122 <sequence> 123 <element ref="md:Extensions" minOccurs="0"/> 124 <element ref="md:OrganizationName" maxOccurs="unbounded"/> 125 <element ref="md:OrganizationDisplayName" maxOccurs="unbounded"/> 126 <element ref="md:OrganizationURL" maxOccurs="unbounded"/> 127 </sequence> 128 <anyAttribute namespace="##other" processContents="lax"/> 129 </complexType> 130 <element name="OrganizationName" type="md:localizedNameType"/> 131 <element name="OrganizationDisplayName" type="md:localizedNameType"/> 132 <element name="OrganizationURL" type="md:localizedURIType"/>
The documentation contains a metadata example for a Service Provider where the
Organization is defined as:
55 <Organization> 56 <OrganizationName xml:lang="en"> 57 Academic Journals R US 58 </OrganizationName> 59 <OrganizationDisplayName xml:lang="en"> 60 Academic Journals R US, a Division of Dirk Corp. 61 </OrganizationDisplayName> 62 <OrganizationURL xml:lang="en"> 63 https://ServiceProvider.com 64 </OrganizationURL> 65 </Organization>
and note that the xml namespace is predefined, it would be an error to explicitly redefine it.
xml:lang and it's use in XML document schemas
In section 2.12 Language Identification of the XML Specification, it says:
In document processing, it is often useful to identify the natural or formal language in which the content is written. A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by [IETF RFC 3066], Tags for the Identification of Languages, or its successor; in addition, the empty string may be specified.
XML Schema requires that the xml namespace be declared and imported before using xml:lang (and other xml namespace values)
For a small discussion on the reuse of attributes form the XML Specification you can look as this mail thread from xml-schema list.
The also a section on When to use your own element or attribute:
When the language value is really an attribute of or metadata about some external content, then xml:lang is not an appropriate choice. In these cases you want to store language information, but the language doesn't refer to the content of the XML document (or included content, such as images, which are processed as part of the document) directly. In this case you should define an element or attribute of using a different name and not use the xml:lang attribute. The value of the element or attribute should use RFC 3066 (or its successor), just like xml:lang.
An ups, this disqualifies it's use on
OrganizationURL since this is referring to another document. In real life this is a no-problem, and if not for anything else then because the
Organization element is optional.
The value of reuse
A more interesting dicussion is the value of reuse, which varies greatly from both where it's applied and used.
The XML Schema itself reuses the type
localizedNameType several times as an syntactic component, since it very generic only constraining to Name something and with an attribute to described the language. The URL variant is only used once and in general since an empty value is allowed I would have liked the attribute to be optional instead of required.
The value of reusing
xml:lang could be argued as minimal, since it would be easy to redefine without to much struggle, and the level of generic support for this attribut is in my opinion limited and tied to the domain/application use.
Update! Since making this post I've had some experience with it in XMLBeans which actually has some built in checks that surpass the definition in the xml.xsd:
92 <xs:attribute name="lang"> 93 <xs:annotation> 94 <xs:documentation>Attempting to install the relevant ISO 2- and 3-letter 95 codes as the enumerated possible values is probably never 96 going to be a realistic possibility. See 97 RFC 3066 at http://www.ietf.org/rfc/rfc3066.txt and the IANA registry 98 at http://www.iana.org/assignments/lang-tag-apps.htm for 99 further information. 100 101 The union allows for the 'un-declaration' of xml:lang with 102 the empty string.</xs:documentation> 103 </xs:annotation> 104 <xs:simpleType> 105 <xs:union memberTypes="xs:language"> 106 <xs:simpleType> 107 <xs:restriction base="xs:string"> 108 <xs:enumeration value=""/> 109 </xs:restriction> 110 </xs:simpleType> 111 </xs:union> 112 </xs:simpleType> 113 </xs:attribute>
For one it checks that the overall syntax is correct, ex. It'll call it an error if I use
da_DK instead of
da-DK and doesn't allow for it to be empty like the xml.xsd states (which is fine by me since I personally dislikes empty elements and attributes in data-centric scenarios).
Language Identifiers (RFC 3066)
The page Using Language Identifiers (RFC 3066) is great for quick brush up. Being a Dane my interest is on the three examples for Denmark:
da-DK (Danish) de-DK (German)
That's for the majority speaking danish in Denmark, the minority speaking danish in Germany and german in Denmark. Since both Denmark and the choice for speaking danish is very liminite in geographic and individuals, it doesn't really make sense to me to go for anything other than
A great source for quick trip around Web Internationalization Standards and Practice is the presentation/tutorial [PDF] that can be found on that page.