public abstract class TextNormalizer extends Object
This is important when comparing hashed passwords because plaintext that visually looks the same might actually be represented differently binarily, without the user being aware. For example, `é` (the letter `e` with accent acute) may be represented as a single Unicode character (U+00E9) or composed of two characters (U+0065 + U+0301), but both representations are canonically equivalent.
This class first tries to use the ICU4J library for normalization because it normalizes character arrays
without converting to String
. If ICU4J is not available, then it falls back to the text normalizer
provided by the JDK, which produces an **intermediate String
representation** of the text.
In other words, if you need to prevent a cleanable char[]
password being turned into a temporary
String
during Unicode character normalization, you need to include a dependency to ICU4J.
Constructor and Description |
---|
TextNormalizer() |
Modifier and Type | Method and Description |
---|---|
static TextNormalizer |
getInstance()
Get an instance of a text normalizer.
|
abstract char[] |
normalizeToNfc(char[] source)
Returns the canonically equivalent normalized (NFC) version of a Unicode character array.
|
public static TextNormalizer getInstance()
If the ICU4J library is available, the returned instance will use an ICU4J normalizer, which handles character
arrays without converting to String
. Otherwise (if ICU4J is not available), the fallback instance
returned uses the normalizer provided by the JDK, which produces an **intermediate String
representation** of the normalized text.
public abstract char[] normalizeToNfc(char[] source)
Note:
If the ICU4J library for normalization is not available, the fallback Normalizer provided by the JDK
will produce an intermediate String
representation of the normalized text!
source
- any Unicode text
OACC is a Java Application Security Framework developed by Acciente, LLC., released under Apache License 2.0.
Copyright 2009-2017, Acciente, LLC.