Package gov.nih.mipav.model.file
Class MetadataExtractor.Iso2022Converter
- java.lang.Object
-
- gov.nih.mipav.model.file.MetadataExtractor.Iso2022Converter
-
- Enclosing class:
- MetadataExtractor
public final class MetadataExtractor.Iso2022Converter extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description private static int
DOT
private static byte
DOT_SIGN
private static byte
ESC
private static java.lang.String
ISO_8859_1
private static byte
LATIN_CAPITAL_A
private static byte
LATIN_CAPITAL_G
private static byte
PERCENT_SIGN
private static java.lang.String
UTF_8
-
Constructor Summary
Constructors Constructor Description Iso2022Converter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
convertISO2022CharsetToJavaCharset(byte[] bytes)
Converts the given ISO2022 char set to a Java charset name.(package private) java.nio.charset.Charset
guessCharSet(byte[] bytes)
Attempts to guess theCharset
of a string provided as a byte array.
-
-
-
Field Detail
-
ISO_8859_1
private static final java.lang.String ISO_8859_1
- See Also:
- Constant Field Values
-
UTF_8
private static final java.lang.String UTF_8
- See Also:
- Constant Field Values
-
LATIN_CAPITAL_A
private static final byte LATIN_CAPITAL_A
- See Also:
- Constant Field Values
-
DOT
private static final int DOT
- See Also:
- Constant Field Values
-
LATIN_CAPITAL_G
private static final byte LATIN_CAPITAL_G
- See Also:
- Constant Field Values
-
PERCENT_SIGN
private static final byte PERCENT_SIGN
- See Also:
- Constant Field Values
-
DOT_SIGN
private static final byte DOT_SIGN
- See Also:
- Constant Field Values
-
ESC
private static final byte ESC
- See Also:
- Constant Field Values
-
-
Method Detail
-
convertISO2022CharsetToJavaCharset
public java.lang.String convertISO2022CharsetToJavaCharset(byte[] bytes)
Converts the given ISO2022 char set to a Java charset name. A reference of valid charsets can be found here: http://nozer0.github.io/en/technology/system/character-encoding/#ISO/IEC%202022- Parameters:
bytes
- string data encoded using ISO2022- Returns:
- the Java charset name as a string, or
null
if the conversion was not possible
-
guessCharSet
java.nio.charset.Charset guessCharSet(byte[] bytes)
Attempts to guess theCharset
of a string provided as a byte array.Charsets trialled are, in order:
- UTF-8
System.getProperty("file.encoding")
- ISO-8859-1
Its only purpose is to guess the Charset if and only if IPTC tag coded character set is not set. If the encoding is not UTF-8, the tag should be set. Otherwise it is bad practice. This method tries to workaround this issue since some metadata manipulating tools do not prevent such bad practice.
About the reliability of this method: The check if some bytes are UTF-8 or not has a very high reliability. The two other checks are less reliable.
- Parameters:
bytes
- some text as bytes- Returns:
- the name of the encoding or null if none could be guessed
-
-