You Might Be a Data Scientist...

Posted on May 25, 2011

As I understand it, 80% of a “data scientists” job is massaging a data set into something useful. Many large data sets come in plain text and are almost never very uniform. It requires parsing and fixing and tuning to format it just right for a computer program to utilize the data efficiently. And accurately!

For the slightly neurotic programmers amongst us, this can be a very pleasing experience. I don’t know what part of my brain enjoys this intellectual masochism, but I absolutely get a buzz from finding patterns in data and writing programs that can see, verify, and take actions from them. There have been numerous occasions on the job where I had to noodle around with a large data set to fix some application or save some corrupted data from certain doom. While tedious and lacking in the comforts of determinism; I find it to be a rather… invigorating art (I tend to get quite animated during these sessions much to the chagrin of those who witness them).

So I think I might make a good data scientist. There’s still some math I’m working out (I’m working through my copy of Concrete Mathematics and follow a few courses on MIT OpenCourseWare). Yet if that’s only a small fraction of the job, I don’t think there’s much to worry about. The grungy, tortuous, hard bits… I’m really good at that stuff. And worse, I actually kind of like it in an insane sort of way.