unicode - Java Scanner Class bad character "®" -
i have scanner class reading file string. file character "®" causes fail. i'm new java, there better way read file character accepted?
public void readfile(string filename) { filetext = ""; try { scanner file = new scanner(new file(filename)); while (file.hasnextline()) { string line = file.nextline(); filetext += line +"\r"+"\n"; } file.close(); } catch (exception e) { system.out.println(e); } }
by default scanner uses platform default character encoding, might not match character encoding of file. javadoc states:
constructs new scanner produces values scanned specified file. bytes file converted characters using underlying platform's default charset.
first determine character encoding file in, can done linux command line utility file -i
. pass correct encoding scanner. java 7 contains predefined constants in java.nio.charset.standardcharsets known character sets.
scanner file = new scanner(new file(filename), standardcharsets.utf_8);
Comments
Post a Comment