Such data, much despair

It appears that running SciKit Learn’s SVM classifier might actually be easier than obtaining the Twitter data that it’s supposed to classify.

Then again, mathematical models have at least some kind of answer, even if the answer is arrived at iteratively. Not so the batch downloading, cleaning, rate-limiting and other antics required of people wanting to use real-world data from a third party.

To be fair, Twitter’s API is really nicely documented. The day that I wasted trying to figure out why the authentication wasn’t working was entirely my fault, and due to a typo, of all things.

But now I really do feel like I’m beginning to learn the pains of machine learning. Rule #1: make sure you have enough data, and that enough of it is good enough.


