c# - Faster alternative than loop through list to specify non unique values and their position -
i have data set need isolate , feedback non-unique values across multiple columns. (think multi column primary key violation in database table). doing concatenating columns on each row list<string>
. doing count of item in list , if greater 1 adding error message another column on same row - bit's important, need able provide feedback on position of duplicate/s, not fact there duplicate.
the problem speed, although technically works, doesn't work practical solution because potentially working data sets of several hundred thousand rows
code:
list<string> mylist = new list<string>(); string thisconcat = ""; (int = 0; < dtlogdata.rows.count-1; i++) { foreach (int colnum in colnumlist) { thisconcat += dtlogdata.rows[i].field<string>(colnum-1); } mylist.add(thisconcat); thisconcat = ""; }
then:
for (int = 0; < dtlogdata.rows.count-1; i++) { int count = mylist.count(j => j == mylist[i]); if (count > 1) { dtlogdata.rows[i][colcnt] = myerrorstring; } }
i'll give different solution. assumes willing add column datatable:
dtlogdata.columns.add("hash");
then cast table:
var t = dtlogdata.asenumerable();
first compute concatenated string , hold of rows. can in 2 ways. if want limit columns indexes (like original code):
var rows = t.select(row => { stringbuilder builder = new stringbuilder(); colnumlist.foreach(i => builder.append(row[i])); row["hash"] = builder.tostring(); return row; } );
or if want use columns:
var rows = t.select(row => { row["hash"] = string.join("", row.itemarray.select(i => i.tostring())); return row; } );
you grab rows duplicates , mark them accordingly:
foreach (var datarow in rows.groupby(r => r["hash"]).where(g => g.count() > 1).selectmany(g => g)) { datarow[colcnt] = myerrorstring; }
Comments
Post a Comment