scala - Spark saveAsTextFIle goes in endless loop -
i'm running spark in stand-alone mode on single machine. have rdd name of productuservectors
, this
[("11342",map(..)),("21435",map(..)),...]
the number of rows in normalisedvectors
8164. wanted possible pair combinations between rows of rdd , compute score based on maps in each row. used cartesian
possible pairs, , i'm filtering them shown below
scala> val normalisedvectors = productuservector.map(line=>utilinst.normalisevector(line)).sortby(_._1.toint) scala> val combinedrdd = normalisedvectors.cartesian(normalisedvectors).filter(line=>line._1._1.toint > line._2._1.toint && utilinst.filterstyleatp(line._1._1,line._2._1)) scala> val scoresrdd = combinedrdd.map(line=>utilinst.getscore(line)).filter(line=>line._3 > 0) scala> val finalrdd = scoresrdd.map(line=> (line._1,list((line._2,line._3)))).reducebykey(_ ++ _) scala> finalrdd.saveastextfile(outputpath)
i have set driver memory @ 8gb , executor memory @ 2gb. here, utilinst
, it's functions used filter pairs results of cartesian
of original rdd. however, output shows goes endless loop shown logs below
16/11/17 18:50:14 info configuration.deprecation: mapred.tip.id deprecated. instead, use mapreduce.task.id 16/11/17 18:50:14 info configuration.deprecation: mapred.task.id deprecated. instead, use mapreduce.task.attempt.id 16/11/17 18:50:14 info configuration.deprecation: mapred.task.is.map deprecated. instead, use mapreduce.task.ismap 16/11/17 18:50:14 info configuration.deprecation: mapred.task.partition deprecated. instead, use mapreduce.task.partition 16/11/17 18:50:14 info configuration.deprecation: mapred.job.id deprecated. instead, use mapreduce.job.id 16/11/17 18:50:31 info executor.executor: finished task 3.0 in stage 0.0 (tid 3). 1491 bytes result sent driver 16/11/17 18:50:31 info executor.executor: finished task 5.0 in stage 0.0 (tid 5). 1491 bytes result sent driver 16/11/17 18:50:31 info scheduler.tasksetmanager: finished task 5.0 in stage 0.0 (tid 5) in 17339 ms on localhost (1/6) 16/11/17 18:50:31 info scheduler.tasksetmanager: finished task 3.0 in stage 0.0 (tid 3) in 17346 ms on localhost (2/6) 16/11/17 18:50:31 info executor.executor: finished task 1.0 in stage 0.0 (tid 1). 1491 bytes result sent driver 16/11/17 18:50:31 info scheduler.tasksetmanager: finished task 1.0 in stage 0.0 (tid 1) in 17423 ms on localhost (3/6) 16/11/17 18:50:32 info executor.executor: finished task 0.0 in stage 0.0 (tid 0). 1491 bytes result sent driver 16/11/17 18:50:32 info executor.executor: finished task 2.0 in stage 0.0 (tid 2). 1491 bytes result sent driver 16/11/17 18:50:32 info scheduler.tasksetmanager: finished task 0.0 in stage 0.0 (tid 0) in 18092 ms on localhost (4/6) 16/11/17 18:50:32 info scheduler.tasksetmanager: finished task 2.0 in stage 0.0 (tid 2) in 18063 ms on localhost (5/6) 16/11/17 18:50:32 info executor.executor: finished task 4.0 in stage 0.0 (tid 4). 1491 bytes result sent driver 16/11/17 18:50:32 info scheduler.tasksetmanager: finished task 4.0 in stage 0.0 (tid 4) in 18073 ms on localhost (6/6) 16/11/17 18:50:32 info scheduler.taskschedulerimpl: removed taskset 0.0, tasks have completed, pool 16/11/17 18:50:32 info scheduler.dagscheduler: shufflemapstage 0 (union @ iterateusers.scala:84) finished in 18.125 s 16/11/17 18:50:32 info scheduler.dagscheduler: looking newly runnable stages 16/11/17 18:50:32 info scheduler.dagscheduler: running: set() 16/11/17 18:50:32 info scheduler.dagscheduler: waiting: set(resultstage 1) 16/11/17 18:50:32 info scheduler.dagscheduler: failed: set() 16/11/17 18:50:32 info scheduler.dagscheduler: submitting resultstage 1 (shuffledrdd[11] @ reducebykey @ iterateusers.scala:87), has no missing parents 16/11/17 18:50:32 info memory.memorystore: block broadcast_2 stored values in memory (estimated size 2.9 kb, free 4.1 gb) 16/11/17 18:50:32 info memory.memorystore: block broadcast_2_piece0 stored bytes in memory (estimated size 1819.0 b, free 4.1 gb) 16/11/17 18:50:32 info storage.blockmanagerinfo: added broadcast_2_piece0 in memory on 127.0.0.1:60497 (size: 1819.0 b, free: 4.1 gb) 16/11/17 18:50:32 info spark.sparkcontext: created broadcast 2 broadcast @ dagscheduler.scala:1012 16/11/17 18:50:32 info scheduler.dagscheduler: submitting 6 missing tasks resultstage 1 (shuffledrdd[11] @ reducebykey @ iterateusers.scala:87) 16/11/17 18:50:32 info scheduler.taskschedulerimpl: adding task set 1.0 6 tasks 16/11/17 18:50:32 info scheduler.tasksetmanager: starting task 0.0 in stage 1.0 (tid 6, localhost, partition 0, any, 5126 bytes) 16/11/17 18:50:32 info scheduler.tasksetmanager: starting task 1.0 in stage 1.0 (tid 7, localhost, partition 1, any, 5126 bytes) 16/11/17 18:50:32 info scheduler.tasksetmanager: starting task 2.0 in stage 1.0 (tid 8, localhost, partition 2, any, 5126 bytes) 16/11/17 18:50:32 info scheduler.tasksetmanager: starting task 3.0 in stage 1.0 (tid 9, localhost, partition 3, any, 5126 bytes) 16/11/17 18:50:32 info scheduler.tasksetmanager: starting task 4.0 in stage 1.0 (tid 10, localhost, partition 4, any, 5126 bytes) 16/11/17 18:50:32 info scheduler.tasksetmanager: starting task 5.0 in stage 1.0 (tid 11, localhost, partition 5, any, 5126 bytes) 16/11/17 18:50:32 info executor.executor: running task 0.0 in stage 1.0 (tid 6) 16/11/17 18:50:32 info executor.executor: running task 5.0 in stage 1.0 (tid 11) 16/11/17 18:50:32 info executor.executor: running task 1.0 in stage 1.0 (tid 7) 16/11/17 18:50:32 info executor.executor: running task 3.0 in stage 1.0 (tid 9) 16/11/17 18:50:32 info executor.executor: running task 2.0 in stage 1.0 (tid 8) 16/11/17 18:50:32 info executor.executor: running task 4.0 in stage 1.0 (tid 10) 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 6 ms 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 5 ms 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 5 ms 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 5 ms 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 6 ms 16/11/17 18:50:32 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 5 ms 16/11/17 18:50:32 info executor.executor: finished task 3.0 in stage 1.0 (tid 9). 1512 bytes result sent driver 16/11/17 18:50:32 info executor.executor: finished task 1.0 in stage 1.0 (tid 7). 1512 bytes result sent driver 16/11/17 18:50:32 info executor.executor: finished task 4.0 in stage 1.0 (tid 10). 1512 bytes result sent driver 16/11/17 18:50:32 info scheduler.tasksetmanager: finished task 3.0 in stage 1.0 (tid 9) in 277 ms on localhost (1/6) 16/11/17 18:50:32 info scheduler.tasksetmanager: finished task 1.0 in stage 1.0 (tid 7) in 283 ms on localhost (2/6) 16/11/17 18:50:32 info scheduler.tasksetmanager: finished task 4.0 in stage 1.0 (tid 10) in 279 ms on localhost (3/6) 16/11/17 18:50:37 info executor.executor: finished task 2.0 in stage 1.0 (tid 8). 1512 bytes result sent driver 16/11/17 18:50:37 info executor.executor: finished task 0.0 in stage 1.0 (tid 6). 1512 bytes result sent driver 16/11/17 18:50:37 info scheduler.tasksetmanager: finished task 0.0 in stage 1.0 (tid 6) in 5120 ms on localhost (4/6) 16/11/17 18:50:37 info scheduler.tasksetmanager: finished task 2.0 in stage 1.0 (tid 8) in 5114 ms on localhost (5/6) 16/11/17 18:50:37 info executor.executor: finished task 5.0 in stage 1.0 (tid 11). 1512 bytes result sent driver 16/11/17 18:50:37 info scheduler.tasksetmanager: finished task 5.0 in stage 1.0 (tid 11) in 5241 ms on localhost (6/6) 16/11/17 18:50:37 info scheduler.taskschedulerimpl: removed taskset 1.0, tasks have completed, pool 16/11/17 18:50:37 info scheduler.dagscheduler: resultstage 1 (count @ iterateusers.scala:88) finished in 5.254 s 16/11/17 18:50:37 info scheduler.dagscheduler: job 0 finished: count @ iterateusers.scala:88, took 23.534860 s 8164 16/11/17 18:50:37 info rdd.unionrdd: removing rdd 10 persistence list 16/11/17 18:50:37 info storage.blockmanager: removing rdd 10 16/11/17 18:50:37 info spark.sparkcontext: starting job: sortby @ iterateusers.scala:91 16/11/17 18:50:37 info spark.mapoutputtrackermaster: size of output statuses shuffle 0 191 bytes 16/11/17 18:50:37 info scheduler.dagscheduler: got job 1 (sortby @ iterateusers.scala:91) 6 output partitions 16/11/17 18:50:37 info scheduler.dagscheduler: final stage: resultstage 3 (sortby @ iterateusers.scala:91) 16/11/17 18:50:37 info scheduler.dagscheduler: parents of final stage: list(shufflemapstage 2) 16/11/17 18:50:37 info scheduler.dagscheduler: missing parents: list() 16/11/17 18:50:37 info scheduler.dagscheduler: submitting resultstage 3 (mappartitionsrdd[15] @ sortby @ iterateusers.scala:91), has no missing parents 16/11/17 18:50:37 info memory.memorystore: block broadcast_3 stored values in memory (estimated size 4.4 kb, free 4.1 gb) 16/11/17 18:50:37 info memory.memorystore: block broadcast_3_piece0 stored bytes in memory (estimated size 2.5 kb, free 4.1 gb) 16/11/17 18:50:37 info storage.blockmanagerinfo: added broadcast_3_piece0 in memory on 127.0.0.1:60497 (size: 2.5 kb, free: 4.1 gb) 16/11/17 18:50:37 info spark.sparkcontext: created broadcast 3 broadcast @ dagscheduler.scala:1012 16/11/17 18:50:37 info scheduler.dagscheduler: submitting 6 missing tasks resultstage 3 (mappartitionsrdd[15] @ sortby @ iterateusers.scala:91) 16/11/17 18:50:37 info scheduler.taskschedulerimpl: adding task set 3.0 6 tasks 16/11/17 18:50:37 info scheduler.tasksetmanager: starting task 0.0 in stage 3.0 (tid 12, localhost, partition 0, any, 5210 bytes) 16/11/17 18:50:37 info scheduler.tasksetmanager: starting task 1.0 in stage 3.0 (tid 13, localhost, partition 1, any, 5210 bytes) 16/11/17 18:50:37 info scheduler.tasksetmanager: starting task 2.0 in stage 3.0 (tid 14, localhost, partition 2, any, 5210 bytes) 16/11/17 18:50:37 info scheduler.tasksetmanager: starting task 3.0 in stage 3.0 (tid 15, localhost, partition 3, any, 5210 bytes) 16/11/17 18:50:37 info scheduler.tasksetmanager: starting task 4.0 in stage 3.0 (tid 16, localhost, partition 4, any, 5210 bytes) 16/11/17 18:50:37 info scheduler.tasksetmanager: starting task 5.0 in stage 3.0 (tid 17, localhost, partition 5, any, 5210 bytes) 16/11/17 18:50:37 info executor.executor: running task 0.0 in stage 3.0 (tid 12) 16/11/17 18:50:37 info executor.executor: running task 4.0 in stage 3.0 (tid 16) 16/11/17 18:50:37 info executor.executor: running task 3.0 in stage 3.0 (tid 15) 16/11/17 18:50:37 info executor.executor: running task 1.0 in stage 3.0 (tid 13) 16/11/17 18:50:37 info executor.executor: running task 2.0 in stage 3.0 (tid 14) 16/11/17 18:50:37 info executor.executor: running task 5.0 in stage 3.0 (tid 17) 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 0 ms 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 0 ms 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 1 ms 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 1 ms 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 0 ms 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:50:37 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 0 ms 16/11/17 18:50:38 info executor.executor: finished task 5.0 in stage 3.0 (tid 17). 1818 bytes result sent driver 16/11/17 18:50:38 info executor.executor: finished task 4.0 in stage 3.0 (tid 16). 1818 bytes result sent driver 16/11/17 18:50:38 info executor.executor: finished task 3.0 in stage 3.0 (tid 15). 1728 bytes result sent driver 16/11/17 18:50:38 info executor.executor: finished task 0.0 in stage 3.0 (tid 12). 1724 bytes result sent driver 16/11/17 18:50:38 info executor.executor: finished task 2.0 in stage 3.0 (tid 14). 1727 bytes result sent driver 16/11/17 18:50:38 info executor.executor: finished task 1.0 in stage 3.0 (tid 13). 1734 bytes result sent driver 16/11/17 18:50:38 info scheduler.tasksetmanager: finished task 5.0 in stage 3.0 (tid 17) in 117 ms on localhost (1/6) 16/11/17 18:50:38 info scheduler.tasksetmanager: finished task 4.0 in stage 3.0 (tid 16) in 120 ms on localhost (2/6) 16/11/17 18:50:38 info scheduler.tasksetmanager: finished task 3.0 in stage 3.0 (tid 15) in 123 ms on localhost (3/6) 16/11/17 18:50:38 info scheduler.tasksetmanager: finished task 0.0 in stage 3.0 (tid 12) in 130 ms on localhost (4/6) 16/11/17 18:50:38 info scheduler.tasksetmanager: finished task 2.0 in stage 3.0 (tid 14) in 128 ms on localhost (5/6) 16/11/17 18:50:38 info scheduler.tasksetmanager: finished task 1.0 in stage 3.0 (tid 13) in 130 ms on localhost (6/6) 16/11/17 18:50:38 info scheduler.taskschedulerimpl: removed taskset 3.0, tasks have completed, pool 16/11/17 18:50:38 info scheduler.dagscheduler: resultstage 3 (sortby @ iterateusers.scala:91) finished in 0.133 s 16/11/17 18:50:38 info scheduler.dagscheduler: job 1 finished: sortby @ iterateusers.scala:91, took 0.154474 s 16/11/17 18:50:38 info rdd.shuffledrdd: removing rdd 11 persistence list 16/11/17 18:50:38 info storage.blockmanager: removing rdd 11 16/11/17 18:50:44 info storage.blockmanagerinfo: removed broadcast_3_piece0 on 127.0.0.1:60497 in memory (size: 2.5 kb, free: 4.1 gb) 16/11/17 18:50:44 info storage.blockmanagerinfo: removed broadcast_2_piece0 on 127.0.0.1:60497 in memory (size: 1819.0 b, free: 4.1 gb) 16/11/17 18:51:37 info storage.blockmanagerinfo: removed broadcast_1_piece0 on 127.0.0.1:60497 in memory (size: 3.1 kb, free: 4.1 gb) 16/11/17 18:52:48 info output.fileoutputcommitter: file output committer algorithm version 1 16/11/17 18:52:48 info spark.sparkcontext: starting job: saveastextfile @ iterateusers.scala:99 16/11/17 18:52:48 info scheduler.dagscheduler: registering rdd 13 (sortby @ iterateusers.scala:91) 16/11/17 18:52:48 info scheduler.dagscheduler: registering rdd 22 (map @ iterateusers.scala:98) 16/11/17 18:52:48 info scheduler.dagscheduler: got job 2 (saveastextfile @ iterateusers.scala:99) 36 output partitions 16/11/17 18:52:48 info scheduler.dagscheduler: final stage: resultstage 7 (saveastextfile @ iterateusers.scala:99) 16/11/17 18:52:48 info scheduler.dagscheduler: parents of final stage: list(shufflemapstage 6) 16/11/17 18:52:48 info scheduler.dagscheduler: missing parents: list(shufflemapstage 6) 16/11/17 18:52:48 info scheduler.dagscheduler: submitting shufflemapstage 5 (mappartitionsrdd[13] @ sortby @ iterateusers.scala:91), has no missing parents 16/11/17 18:52:50 info memory.memorystore: block broadcast_4 stored values in memory (estimated size 33.5 mb, free 4.1 gb) 16/11/17 18:52:50 info memory.memorystore: block broadcast_4_piece0 stored bytes in memory (estimated size 4.0 mb, free 4.1 gb) 16/11/17 18:52:50 info storage.blockmanagerinfo: added broadcast_4_piece0 in memory on 127.0.0.1:60497 (size: 4.0 mb, free: 4.1 gb) 16/11/17 18:52:50 info memory.memorystore: block broadcast_4_piece1 stored bytes in memory (estimated size 4.0 mb, free 4.1 gb) 16/11/17 18:52:50 info storage.blockmanagerinfo: added broadcast_4_piece1 in memory on 127.0.0.1:60497 (size: 4.0 mb, free: 4.1 gb) 16/11/17 18:52:50 info memory.memorystore: block broadcast_4_piece2 stored bytes in memory (estimated size 4.0 mb, free 4.0 gb) 16/11/17 18:52:50 info storage.blockmanagerinfo: added broadcast_4_piece2 in memory on 127.0.0.1:60497 (size: 4.0 mb, free: 4.1 gb) 16/11/17 18:52:50 info memory.memorystore: block broadcast_4_piece3 stored bytes in memory (estimated size 2.9 mb, free 4.0 gb) 16/11/17 18:52:50 info storage.blockmanagerinfo: added broadcast_4_piece3 in memory on 127.0.0.1:60497 (size: 2.9 mb, free: 4.1 gb) 16/11/17 18:52:50 info spark.sparkcontext: created broadcast 4 broadcast @ dagscheduler.scala:1012 16/11/17 18:52:50 info scheduler.dagscheduler: submitting 6 missing tasks shufflemapstage 5 (mappartitionsrdd[13] @ sortby @ iterateusers.scala:91) 16/11/17 18:52:50 info scheduler.taskschedulerimpl: adding task set 5.0 6 tasks 16/11/17 18:52:50 info scheduler.tasksetmanager: starting task 0.0 in stage 5.0 (tid 18, localhost, partition 0, any, 5207 bytes) 16/11/17 18:52:50 info scheduler.tasksetmanager: starting task 1.0 in stage 5.0 (tid 19, localhost, partition 1, any, 5207 bytes) 16/11/17 18:52:50 info scheduler.tasksetmanager: starting task 2.0 in stage 5.0 (tid 20, localhost, partition 2, any, 5207 bytes) 16/11/17 18:52:50 info scheduler.tasksetmanager: starting task 3.0 in stage 5.0 (tid 21, localhost, partition 3, any, 5207 bytes) 16/11/17 18:52:50 info scheduler.tasksetmanager: starting task 4.0 in stage 5.0 (tid 22, localhost, partition 4, any, 5207 bytes) 16/11/17 18:52:50 info scheduler.tasksetmanager: starting task 5.0 in stage 5.0 (tid 23, localhost, partition 5, any, 5207 bytes) 16/11/17 18:52:50 info executor.executor: running task 0.0 in stage 5.0 (tid 18) 16/11/17 18:52:50 info executor.executor: running task 1.0 in stage 5.0 (tid 19) 16/11/17 18:52:50 info executor.executor: running task 2.0 in stage 5.0 (tid 20) 16/11/17 18:52:50 info executor.executor: running task 3.0 in stage 5.0 (tid 21) 16/11/17 18:52:50 info executor.executor: running task 4.0 in stage 5.0 (tid 22) 16/11/17 18:52:50 info executor.executor: running task 5.0 in stage 5.0 (tid 23) 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 0 ms 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 0 ms 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 1 ms 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 2 ms 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 0 ms 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:02 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 1 ms 16/11/17 18:53:02 info executor.executor: finished task 2.0 in stage 5.0 (tid 20). 1883 bytes result sent driver 16/11/17 18:53:02 info executor.executor: finished task 0.0 in stage 5.0 (tid 18). 1883 bytes result sent driver 16/11/17 18:53:02 info scheduler.tasksetmanager: finished task 2.0 in stage 5.0 (tid 20) in 12006 ms on localhost (1/6) 16/11/17 18:53:02 info scheduler.tasksetmanager: finished task 0.0 in stage 5.0 (tid 18) in 12011 ms on localhost (2/6) 16/11/17 18:53:02 info executor.executor: finished task 5.0 in stage 5.0 (tid 23). 1883 bytes result sent driver 16/11/17 18:53:02 info scheduler.tasksetmanager: finished task 5.0 in stage 5.0 (tid 23) in 12019 ms on localhost (3/6) 16/11/17 18:53:02 info executor.executor: finished task 4.0 in stage 5.0 (tid 22). 1883 bytes result sent driver 16/11/17 18:53:02 info scheduler.tasksetmanager: finished task 4.0 in stage 5.0 (tid 22) in 12027 ms on localhost (4/6) 16/11/17 18:53:02 info executor.executor: finished task 3.0 in stage 5.0 (tid 21). 1883 bytes result sent driver 16/11/17 18:53:02 info scheduler.tasksetmanager: finished task 3.0 in stage 5.0 (tid 21) in 12044 ms on localhost (5/6) 16/11/17 18:53:02 info executor.executor: finished task 1.0 in stage 5.0 (tid 19). 1883 bytes result sent driver 16/11/17 18:53:02 info scheduler.tasksetmanager: finished task 1.0 in stage 5.0 (tid 19) in 12059 ms on localhost (6/6) 16/11/17 18:53:02 info scheduler.taskschedulerimpl: removed taskset 5.0, tasks have completed, pool 16/11/17 18:53:02 info scheduler.dagscheduler: shufflemapstage 5 (sortby @ iterateusers.scala:91) finished in 12.061 s 16/11/17 18:53:02 info scheduler.dagscheduler: looking newly runnable stages 16/11/17 18:53:02 info scheduler.dagscheduler: running: set() 16/11/17 18:53:02 info scheduler.dagscheduler: waiting: set(shufflemapstage 6, resultstage 7) 16/11/17 18:53:02 info scheduler.dagscheduler: failed: set() 16/11/17 18:53:02 info scheduler.dagscheduler: submitting shufflemapstage 6 (mappartitionsrdd[22] @ map @ iterateusers.scala:98), has no missing parents 16/11/17 18:53:05 info memory.memorystore: block broadcast_5 stored values in memory (estimated size 33.5 mb, free 4.0 gb) 16/11/17 18:53:05 info memory.memorystore: block broadcast_5_piece0 stored bytes in memory (estimated size 4.0 mb, free 4.0 gb) 16/11/17 18:53:05 info storage.blockmanagerinfo: added broadcast_5_piece0 in memory on 127.0.0.1:60497 (size: 4.0 mb, free: 4.1 gb) 16/11/17 18:53:05 info memory.memorystore: block broadcast_5_piece1 stored bytes in memory (estimated size 4.0 mb, free 4.0 gb) 16/11/17 18:53:05 info storage.blockmanagerinfo: added broadcast_5_piece1 in memory on 127.0.0.1:60497 (size: 4.0 mb, free: 4.1 gb) 16/11/17 18:53:05 info memory.memorystore: block broadcast_5_piece2 stored bytes in memory (estimated size 4.0 mb, free 4.0 gb) 16/11/17 18:53:05 info storage.blockmanagerinfo: added broadcast_5_piece2 in memory on 127.0.0.1:60497 (size: 4.0 mb, free: 4.1 gb) 16/11/17 18:53:05 info memory.memorystore: block broadcast_5_piece3 stored bytes in memory (estimated size 2.9 mb, free 4.0 gb) 16/11/17 18:53:05 info storage.blockmanagerinfo: added broadcast_5_piece3 in memory on 127.0.0.1:60497 (size: 2.9 mb, free: 4.1 gb) 16/11/17 18:53:05 info spark.sparkcontext: created broadcast 5 broadcast @ dagscheduler.scala:1012 16/11/17 18:53:05 info scheduler.dagscheduler: submitting 36 missing tasks shufflemapstage 6 (mappartitionsrdd[22] @ map @ iterateusers.scala:98) 16/11/17 18:53:05 info scheduler.taskschedulerimpl: adding task set 6.0 36 tasks 16/11/17 18:53:05 info scheduler.tasksetmanager: starting task 0.0 in stage 6.0 (tid 24, localhost, partition 0, any, 5411 bytes) 16/11/17 18:53:05 info scheduler.tasksetmanager: starting task 1.0 in stage 6.0 (tid 25, localhost, partition 1, any, 5420 bytes) 16/11/17 18:53:05 info scheduler.tasksetmanager: starting task 2.0 in stage 6.0 (tid 26, localhost, partition 2, any, 5420 bytes) 16/11/17 18:53:05 info scheduler.tasksetmanager: starting task 3.0 in stage 6.0 (tid 27, localhost, partition 3, any, 5420 bytes) 16/11/17 18:53:05 info scheduler.tasksetmanager: starting task 4.0 in stage 6.0 (tid 28, localhost, partition 4, any, 5420 bytes) 16/11/17 18:53:05 info scheduler.tasksetmanager: starting task 5.0 in stage 6.0 (tid 29, localhost, partition 5, any, 5420 bytes) 16/11/17 18:53:05 info scheduler.tasksetmanager: starting task 6.0 in stage 6.0 (tid 30, localhost, partition 6, any, 5420 bytes) 16/11/17 18:53:05 info scheduler.tasksetmanager: starting task 7.0 in stage 6.0 (tid 31, localhost, partition 7, any, 5411 bytes) 16/11/17 18:53:05 info executor.executor: running task 1.0 in stage 6.0 (tid 25) 16/11/17 18:53:05 info executor.executor: running task 0.0 in stage 6.0 (tid 24) 16/11/17 18:53:05 info executor.executor: running task 4.0 in stage 6.0 (tid 28) 16/11/17 18:53:05 info executor.executor: running task 2.0 in stage 6.0 (tid 26) 16/11/17 18:53:05 info executor.executor: running task 3.0 in stage 6.0 (tid 27) 16/11/17 18:53:05 info executor.executor: running task 5.0 in stage 6.0 (tid 29) 16/11/17 18:53:05 info executor.executor: running task 6.0 in stage 6.0 (tid 30) 16/11/17 18:53:05 info executor.executor: running task 7.0 in stage 6.0 (tid 31) 16/11/17 18:53:13 info storage.blockmanagerinfo: removed broadcast_4_piece0 on 127.0.0.1:60497 in memory (size: 4.0 mb, free: 4.1 gb) 16/11/17 18:53:13 info storage.blockmanagerinfo: removed broadcast_4_piece3 on 127.0.0.1:60497 in memory (size: 2.9 mb, free: 4.1 gb) 16/11/17 18:53:13 info storage.blockmanagerinfo: removed broadcast_4_piece2 on 127.0.0.1:60497 in memory (size: 4.0 mb, free: 4.1 gb) 16/11/17 18:53:13 info storage.blockmanagerinfo: removed broadcast_4_piece1 on 127.0.0.1:60497 in memory (size: 4.0 mb, free: 4.1 gb) 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 1 ms 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 1 ms 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 1 ms 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 0 ms 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 0 ms 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: started 0 remote fetches in 1 ms 16/11/17 18:53:30 info storage.shuffleblockfetcheriterator: getting 6 non-empty blocks out of 6 blocks
it gets stuck in last storage.shuffleblockfetcheriterator
phase endlessly while storing finalrdd
text file. have no idea why it's happening. resolve highly appreciated.
Comments
Post a Comment