Coverage Summary for Class: TextNormalizer (com.acciente.oacc.normalizer)

Class Class, % Method, % Line, %
TextNormalizer 100% (1/ 1) 100% (2/ 2) 33.3% (2/ 6)


1 /* 2  * Copyright 2009-2018, Acciente LLC 3  * 4  * Acciente LLC licenses this file to you under the 5  * Apache License, Version 2.0 (the "License"); you 6  * may not use this file except in compliance with the 7  * License. You may obtain a copy of the License at 8  * 9  * http://www.apache.org/licenses/LICENSE-2.0 10  * 11  * Unless required by applicable law or agreed to in 12  * writing, software distributed under the License is 13  * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES 14  * OR CONDITIONS OF ANY KIND, either express or implied. 15  * See the License for the specific language governing 16  * permissions and limitations under the License. 17  */ 18  19 package com.acciente.oacc.normalizer; 20  21 import com.acciente.oacc.normalizer.icu4j.ICU4Jv26TextNormalizer; 22 import com.acciente.oacc.normalizer.icu4j.ICU4Jv46TextNormalizer; 23 import com.acciente.oacc.normalizer.jdk.JDKTextNormalizer; 24  25 /** 26  * Normalizes Unicode text to handle characters that have more than one canonically equivalent representation. 27  * <p> 28  * This is important when comparing hashed passwords because plaintext that visually looks the same might actually 29  * be represented differently binarily, without the user being aware. For example, `é` (the letter `e` with accent acute) 30  * may be represented as a single Unicode character (U+00E9) or composed of two characters (U+0065 + U+0301), but both 31  * representations are canonically equivalent. 32  * <p> 33  * This class first tries to use the ICU4J library for normalization because it normalizes character arrays 34  * without converting to <code>String</code>. If ICU4J is not available, then it falls back to the text normalizer 35  * provided by the JDK, which produces an **intermediate <code>String</code> representation** of the text. 36  * <p> 37  * In other words, if you need to prevent a cleanable <code>char[]</code> password being turned into a temporary 38  * <code>String</code> during Unicode character normalization, you need to include a dependency to ICU4J. 39  */ 40 public abstract class TextNormalizer { 41  /** 42  * Get an instance of a text normalizer. 43  * <p> 44  * If the ICU4J library is available, the returned instance will use an ICU4J normalizer, which handles character 45  * arrays without converting to <code>String</code>. Otherwise (if ICU4J is not available), the fallback instance 46  * returned uses the normalizer provided by the JDK, which produces an **intermediate <code>String</code> 47  * representation** of the normalized text. 48  * 49  * @return a text normalizer instance 50  */ 51  public static TextNormalizer getInstance() { 52  try { 53  // first see if a newer version of ICU4J is available 54  return ICU4Jv46TextNormalizer.getInstance(); 55  } 56  catch (NoClassDefFoundError e1) { 57  try { 58  // next see if an older version of ICU4J is available 59  return ICU4Jv26TextNormalizer.getInstance(); 60  } 61  catch (NoClassDefFoundError e2) { 62  // otherwise fallback to the non-cleanable JDK based implementation 63  return JDKTextNormalizer.getInstance(); 64  } 65  } 66  } 67  68  /** 69  * Returns the canonically equivalent normalized (NFC) version of a Unicode character array. 70  * <p> 71  * Note: 72  * If the ICU4J library for normalization is not available, the fallback Normalizer provided by the JDK 73  * will produce an intermediate <code>String</code> representation of the normalized text! 74  * 75  * @param source any Unicode text 76  * @return a character array containing the normalized representation of the source text 77  */ 78  public abstract char[] normalizeToNfc(char[] source); 79 }