apache pig - How to read empty field with Regex in Pig? -


i'm trying parse custom formatted log file looks this:

2016-11-05 20:00:00,007 [some$tr!ng_nowhitespace.here] info sin.my.package.objectname timetotal=73 timefirst=73 dev="iphone"  2016-11-05 20:00:02,010 [some$tr!ng_nowhitespace.here/too] info sin.my.package.objectname timetotal=350 timefirst=105 timesecond=245 dev="android" 2016-11-05 20:00:10,207 [some$tr!ng_nowhitespace.here/anothertime] info sin.my.package.objectname timetotal=420 timefirst=100 timesecond=205 timethird=115 dev="ipad" 

notice field timefirst= constant log lines, timesecond= , timethird= may or may not present.

i using following pig script parse log lines myregexloader();

data = load '/path/to/raw/file.lzo'          using org.apache.pig.piggybank.storage.myregexloader('([0-9]{4})-([0-9]{2})-([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}),([0-9]+) \\[(\\s+)\\] ([a-z]+) (\\s+) timetotal=([0-9]+) timefirst=([0-9]+) (timesecond=([0-9]+) )?(timethird=([0-9]+) )?dev="(\\w+)"')          (year: int, month: int, date: int, hour: int, mins: int, sec: int, bytesize: int, blockstr: chararray, msgflag: chararray, objectstr: chararray, timetotal: int, timefirst: int, timesecond: int, dev: chararray); store data '/user/myuser/pigdumps/pigdump1/' using pigstorage(','); 

i know going wrong @ how parsing timesecond= , timethird=, that's best limited regex , pig knowledge. here's error logs console:

input(s): failed read data "/path/to/raw/file.lzo"  output(s): failed produce result in "/user/myuser/pigdumps/pigdump1"  counters: total records written : 0 total bytes written : 0 spillable memory manager spill count : 0 total bags proactively spilled: 0 total records proactively spilled: 0  job dag: job_1478169073918_75270   2016-11-17 16:24:35,503 [uber-subtaskrunner] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher  - failed! 2016-11-17 16:24:35,516 [uber-subtaskrunner] error org.apache.pig.tools.grunt.gruntparser  - error 2999: unexpected internal error. null 

any workaround or in looking @ right direction appreciated.

thanks!

p.s.: working private data, , not provide original samples. synthetic data made replicate problem best way possible. please pardon mistakes in synthesizing, , let me know. shall correct them.


Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -