regex - How to remove \u200B (Zero Length Whitespace Unicode Character) from String in Java? -
my application using spring integration email polling outlook mailbox.
as, receiving string (email body)from external system (outlook), have no control on it.
for example,
string emailbodystr= "rejected sundar14-\u200b.";
now trying remove unicode character \u200b string.
what tried already.
try#1:
emailbodystr = emailbodystr.replaceall("\u200b", "");
try#2:
`emailbodystr = emailbodystr.replaceall("\u200b", "").trim();`
try#3 (using apache commons):
stringescapeutils.unescapejava(emailbodystr);
try#4:
stringescapeutils.unescapejava(emailbodystr).trim();
nothing worked till now.
when tried print string using below code.
logger.info("comment before:{}",emailbodystr); logger.info("comment after :{}",emailbodystr);
in eclipse console, not printing unicode char,
comment before:rejected sundar14-.
but same code prints unicode char in linux console below.
comment before:rejected sundar14-\u200b.
i read examples str.replace() recommended, please note examples uses javascript, php , not java.
finally, able remove 'zero width space' character using 'unicode regex'.
string plainemailbody = new string(); plainemailbody = emailbodystr.replaceall("[\\p{cf}]", "");
reference find category of unicode characters.
- character class java.
character class java lists of these unicode categories.
- website: http://www.fileformat.info/
- website: http://www.regular-expressions.info/ => unicode regular expressions
note 1: received string outlook email body - none of approaches listed in question working.
my application receiving string external system (outlook), have no control on it.
note 2: answer helped me know unicode regular expressions .
Comments
Post a Comment