spark conditional replacement but keep filed values -
i want fill nan values in spark conditionally (to make sure considered each corner case of data , not filling replacement value).
a sample like
case class foobar(foo:string, bar:string) val mydf = seq(("a","first"),("b","second"),("c",null), ("third","foobar"), ("somemore","null")) .todf("foo","bar") .as[foobar] +--------+------+ | foo| bar| +--------+------+ | a| first| | b|second| | c| null| | third|foobar| |somemore| null| +--------+------+
unfortunately
mydf .withcolumn( "bar", when( (($"foo" === "c") , ($"bar" isnull)) , "somereplacement" ) ).show
resets regular other values in column
+--------+---------------+ | foo| bar| +--------+---------------+ | a| null| | b| null| | c|somereplacement| | third| null| |somemore| null| +--------+---------------+
and
mydf .withcolumn( "bar", when( (($"foo" === "c") , ($"bar" isnull)) or (($"foo" === "somemore") , ($"bar" isnull)), "somereplacement" ) ).show
which want use fill in values different classes / categories of foo. not work well.
i curious how fix this.
use otherwise
:
when( (($"foo" === "c") , ($"bar" isnull)) or (($"foo" === "somemore") , ($"bar" isnull)), "somereplacement" ).otherwise($"bar")
or coalesce
:
coalesce( $"bar", when(($"foo" === "c") or ($"foo" === "somemore"), "somereplacement") )
the reason coalesce
is...less typing (so don't repeat $"bar" isnull
).
Comments
Post a Comment