Class MetadataExtractor.Iso2022Converter

  • Enclosing class:
    MetadataExtractor

    public final class MetadataExtractor.Iso2022Converter
    extends java.lang.Object
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String convertISO2022CharsetToJavaCharset​(byte[] bytes)
      Converts the given ISO2022 char set to a Java charset name.
      (package private) java.nio.charset.Charset guessCharSet​(byte[] bytes)
      Attempts to guess the Charset of a string provided as a byte array.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Iso2022Converter

        public Iso2022Converter()
    • Method Detail

      • convertISO2022CharsetToJavaCharset

        public java.lang.String convertISO2022CharsetToJavaCharset​(byte[] bytes)
        Converts the given ISO2022 char set to a Java charset name. A reference of valid charsets can be found here: http://nozer0.github.io/en/technology/system/character-encoding/#ISO/IEC%202022
        Parameters:
        bytes - string data encoded using ISO2022
        Returns:
        the Java charset name as a string, or null if the conversion was not possible
      • guessCharSet

        java.nio.charset.Charset guessCharSet​(byte[] bytes)
        Attempts to guess the Charset of a string provided as a byte array.

        Charsets trialled are, in order:

        • UTF-8
        • System.getProperty("file.encoding")
        • ISO-8859-1

        Its only purpose is to guess the Charset if and only if IPTC tag coded character set is not set. If the encoding is not UTF-8, the tag should be set. Otherwise it is bad practice. This method tries to workaround this issue since some metadata manipulating tools do not prevent such bad practice.

        About the reliability of this method: The check if some bytes are UTF-8 or not has a very high reliability. The two other checks are less reliable.

        Parameters:
        bytes - some text as bytes
        Returns:
        the name of the encoding or null if none could be guessed