text processing - linear table to matrix format -
i convert linear table matrix format.
my input table looks , called "linear_table.tab":
transcript ortho transcript_1 ortho_1 transcript_2 ortho_2 transcript_3 ortho_3 transcript_4 ortho_4 transcript_5 ortho_5 transcript_6 ortho_6 transcript_7 ortho_5 transcript_8 ortho_1 transcript_9 ortho_4 transcript_10 ortho_5 transcript_11 ortho_2 transcript_12 ortho_7 transcript_13 ortho_8 transcript_14 ortho_5 transcript_15 ortho_2 transcript_16 ortho_9
what matrix table like:
transcript_1 transcript_2 transcript_3 transcript_4 transcript_5 transcript_6 transcript_7 transcript_8 transcript_9 transcript_10 transcript_11 transcript_12 transcript_13 transcript_14 transcript_15 transcript_16 transcript_1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 transcript_2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 transcript_3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 transcript_4 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 transcript_5 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 transcript_6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 transcript_7 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 transcript_8 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 transcript_9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 transcript_10 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 transcript_11 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 transcript_12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 transcript_13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 transcript_14 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 transcript_15 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 transcript_16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
here code using r:
linear.table <- read.table("linear_table.tab", header=t, sep="\t") library(reshape2) dcast(linear.table, transcript~ortho, fill=0)
i following output in r:
transcript ortho_1 ortho_2 ortho_3 ortho_4 ortho_5 ortho_6 ortho_7 ortho_8 ortho_9 transcript_1 ortho_1 0 0 0 0 0 0 0 0 transcript_10 0 0 0 0 ortho_5 0 0 0 0 transcript_11 0 ortho_2 0 0 0 0 0 0 0 transcript_12 0 0 0 0 0 0 ortho_7 0 0 transcript_13 0 0 0 0 0 0 0 ortho_8 0 transcript_14 0 0 0 0 ortho_5 0 0 0 0 transcript_15 0 ortho_2 0 0 0 0 0 0 0 transcript_16 0 0 0 0 0 0 0 0 ortho_9 transcript_2 0 ortho_2 0 0 0 0 0 0 0 transcript_3 0 0 ortho_3 0 0 0 0 0 0 transcript_4 0 0 0 ortho_4 0 0 0 0 0 transcript_5 0 0 0 0 ortho_5 0 0 0 0 transcript_6 0 0 0 0 0 ortho_6 0 0 0 transcript_7 0 0 0 0 ortho_5 0 0 0 0 transcript_8 ortho_1 0 0 0 0 0 0 0 0 transcript_9 0 0 0 ortho_4 0 0 0 0 0
i not sure how proceed in aspect using r.
using awk
:
$ cat ortho.awk nr > 1 { transcript = $1; ortho = $2; = transcript; j = ortho; sub("transcript_", "", i); sub("ortho_", "", j); imx[i][j] = 1; } end { (i in imx) { (j in imx) { omx["transcript_"+i]["transcript_"+j] = imx[i][j] == "" ? 0 : 1; } } printf("\t"); (i in omx) { printf "\ttranscript%d", i; } print ""; (i in omx) { printf "transcript%d", i; (j in omx) { printf "\t%d", omx[i][j]; } print ""; } }
idea populate sparse matrix of 1's, @ end fill 0's in missing spots. print out.
Comments
Post a Comment