regex - How to remove \u200B (Zero Length Whitespace Unicode Character) from String in Java? -


my application using spring integration email polling outlook mailbox.

as, receiving string (email body)from external system (outlook), have no control on it.

for example,

string emailbodystr= "rejected sundar14-\u200b."; 

now trying remove unicode character \u200b string.

what tried already.

try#1:

emailbodystr = emailbodystr.replaceall("\u200b", ""); 

try#2:

`emailbodystr = emailbodystr.replaceall("\u200b", "").trim();` 

try#3 (using apache commons):

stringescapeutils.unescapejava(emailbodystr); 

try#4:

stringescapeutils.unescapejava(emailbodystr).trim(); 

nothing worked till now.

when tried print string using below code.

logger.info("comment before:{}",emailbodystr); logger.info("comment after :{}",emailbodystr); 

in eclipse console, not printing unicode char,

comment before:rejected sundar14-​.

but same code prints unicode char in linux console below.

comment before:rejected sundar14-\u200b.

i read examples str.replace() recommended, please note examples uses javascript, php , not java.

finally, able remove 'zero width space' character using 'unicode regex'.

string plainemailbody = new string(); plainemailbody = emailbodystr.replaceall("[\\p{cf}]", ""); 

reference find category of unicode characters.

  1. character class java.

character class java lists of these unicode categories.

enter image description here

  1. website: http://www.fileformat.info/

character category

  1. website: http://www.regular-expressions.info/ => unicode regular expressions

unicode regex \u200b character

note 1: received string outlook email body - none of approaches listed in question working.

my application receiving string external system (outlook), have no control on it.

note 2: answer helped me know unicode regular expressions .


Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -