utf 8 - Reading a single UTF8 character from stream in C# -
i looking read next utf8 character stream or binaryreader. things don't work:
binaryreader::readchar -- throw on 3 or 4 byte character. since returns 2 byte structure, has no choice.
binaryreader::readchars -- throw if ask read 1 character , encounters 3 or 4 byte character. read multiple characters if ask read more 1 character.
streamreader::read -- needs know how many bytes read, number of bytes in utf8 character variable.
the code have seems work:
private char[] readutf8char(stream s) { byte[] bytes = new byte[4]; var enc = new utf8encoding(false, true); if (1 != s.read(bytes, 0, 1)) return null; if (bytes[0] <= 0x7f) //single byte character { return enc.getchars(bytes, 0, 1); } else { var remainingbytes = ((bytes[0] & 240) == 240) ? 3 : ( ((bytes[0] & 224) == 224) ? 2 : ( ((bytes[0] & 192) == 192) ? 1 : -1 )); if (remainingbytes == -1) return null; s.read(bytes, 1, remainingbytes); return enc.getchars(bytes, 0, remainingbytes + 1); } }
obviously, bit of mess, , specific utf8. there more elegant, less custom, easier-to-read solution problem?
Comments
Post a Comment