Converting diacritics to ASCII in Shibboleth IdP attributes

27 February 2014

Some languages use diacritical marks to change the sound-value of the letter to which they are added. For example, this is a special "lorem ipsum"-like sentence in Czech language, where all available diacritical marks are used:

Příliš žluťoučký kůň úpěl ďábelské ódy.

Sometimes there may be a requirement, that the value of a user's attribute released by a Shibboleth IdP has to be in ASCII. If the source attribute (as it is stored in the user database) is in UTF-8 and contains diacritics, one possible solution is to convert it using a script attribute definition in the Shibboleth IdP configuration.

For example, if we need the user's common name attribute in ASCII, we can add the following attribute definition to the attribute-resolver.xml configuration file:

<resolver:AttributeDefinition xsi:type="ad:Script" id="commonNameASCII">
    <resolver:Dependency ref="commonName" />  

    <resolver:AttributeEncoder xsi:type="enc:SAML2String" name="http://example.org/attributes/commonName#ASCII" friendlyName="commonNameASCII" />

    <ad:ScriptFile>/opt/idp/script/commonNameASCII.js</ad:ScriptFile>

</resolver:AttributeDefinition>

The file /opt/idp/script/commonNameASCII.js contains the script, that actually does the conversion:

importPackage(Packages.edu.internet2.middleware.shibboleth.common.attribute.provider);
importPackage(Packages.java.lang);
importPackage(Packages.java.text);

commonNameASCII = new BasicAttribute("commonNameASCII");

if (!commonName.getValues().isEmpty()) {
    originalValue = commonName.getValues().get(0);
    asciiValue = Normalizer.normalize(originalValue, Normalizer.Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");

    commonNameASCII.getValues().add(asciiValue);
}

Shibboleth 4