Package gov.nih.mipav.model.file
Class MetadataExtractor.Iso2022Converter
java.lang.Object
gov.nih.mipav.model.file.MetadataExtractor.Iso2022Converter
- Enclosing class:
MetadataExtractor
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionconvertISO2022CharsetToJavaCharset(byte[] bytes) Converts the given ISO2022 char set to a Java charset name.(package private) CharsetguessCharSet(byte[] bytes) Attempts to guess theCharsetof a string provided as a byte array.
-
Field Details
-
ISO_8859_1
- See Also:
-
UTF_8
- See Also:
-
LATIN_CAPITAL_A
private static final byte LATIN_CAPITAL_A- See Also:
-
DOT
private static final int DOT- See Also:
-
LATIN_CAPITAL_G
private static final byte LATIN_CAPITAL_G- See Also:
-
PERCENT_SIGN
private static final byte PERCENT_SIGN- See Also:
-
DOT_SIGN
private static final byte DOT_SIGN- See Also:
-
ESC
private static final byte ESC- See Also:
-
-
Constructor Details
-
Iso2022Converter
public Iso2022Converter()
-
-
Method Details
-
convertISO2022CharsetToJavaCharset
Converts the given ISO2022 char set to a Java charset name. A reference of valid charsets can be found here: http://nozer0.github.io/en/technology/system/character-encoding/#ISO/IEC%202022- Parameters:
bytes- string data encoded using ISO2022- Returns:
- the Java charset name as a string, or
nullif the conversion was not possible
-
guessCharSet
Attempts to guess theCharsetof a string provided as a byte array.Charsets trialled are, in order:
- UTF-8
System.getProperty("file.encoding")- ISO-8859-1
Its only purpose is to guess the Charset if and only if IPTC tag coded character set is not set. If the encoding is not UTF-8, the tag should be set. Otherwise it is bad practice. This method tries to workaround this issue since some metadata manipulating tools do not prevent such bad practice.
About the reliability of this method: The check if some bytes are UTF-8 or not has a very high reliability. The two other checks are less reliable.
- Parameters:
bytes- some text as bytes- Returns:
- the name of the encoding or null if none could be guessed
-