pyspark - How to Reduce Nested Dictionaries in Spark -


(in 'pyspark') have rdd contains multiple dictionaries. each of these dictionaries, in turn, contain multiple dictionaries. looks this:

label1 : {tag1, : count = 2, tag2: count = 3}, {tag2 : count = 3}, {tag3 : count = 1}, ... label2 : {tag1, : count = 2, tag3: count = 2}, {tag2 : count = 5}, {tag4 : count = 3}, ... . . 

given structure, i'd able "reduce" dictionaries result has following form:

label1 : {tag1 : count = 2}, {tag : count = 6}, {tag3 : count = 1} ... label2 : {tag1 : count = 2}, {tag2 : count = 5}, {tag3 : count = 2}, {tag4 : count = 3}... . . . 

i have feeling resembles 'reduce' or 'combine' or 'groupby' having difficulty finding right function. can please point me function, in spark, might accomplish task? thanks!

this should flatten iterator of dictionaries 1 big dictionary:

def combine(iter):     bigdict = dict()     littledict in iter:         key, value in littledict.iteritems():             bigdict[key] = value     return bigdict rdd.map(combine) 

Comments

Popular posts from this blog

linux - Could not find a package configuration file provided by "Qt5Svg" -

simple.odata.client - Simple OData Client Unlink -