Class MetadataExtractor.Iso2022Converter

java.lang.Object
gov.nih.mipav.model.file.MetadataExtractor.Iso2022Converter
Enclosing class:
MetadataExtractor

public final class MetadataExtractor.Iso2022Converter extends Object
  • Field Details

  • Constructor Details

    • Iso2022Converter

      public Iso2022Converter()
  • Method Details

    • convertISO2022CharsetToJavaCharset

      public String convertISO2022CharsetToJavaCharset(byte[] bytes)
      Converts the given ISO2022 char set to a Java charset name. A reference of valid charsets can be found here: http://nozer0.github.io/en/technology/system/character-encoding/#ISO/IEC%202022
      Parameters:
      bytes - string data encoded using ISO2022
      Returns:
      the Java charset name as a string, or null if the conversion was not possible
    • guessCharSet

      Charset guessCharSet(byte[] bytes)
      Attempts to guess the Charset of a string provided as a byte array.

      Charsets trialled are, in order:

      • UTF-8
      • System.getProperty("file.encoding")
      • ISO-8859-1

      Its only purpose is to guess the Charset if and only if IPTC tag coded character set is not set. If the encoding is not UTF-8, the tag should be set. Otherwise it is bad practice. This method tries to workaround this issue since some metadata manipulating tools do not prevent such bad practice.

      About the reliability of this method: The check if some bytes are UTF-8 or not has a very high reliability. The two other checks are less reliable.

      Parameters:
      bytes - some text as bytes
      Returns:
      the name of the encoding or null if none could be guessed