Dear Martin, thank you for feedback. Checked OpenRefine, but as its working as a part of google spreadsheets (if I understand it correctly) then with my data set will hardly work (as its too big), so another direction should be R. Actually I was thinking about it as a option, but I never worked with it. So will need to find a help with it.
Feel free to post link to paper we published about our research. There are some quite interesting finding so far.