Such data, much despair

It appears that running SciKit Learn’s SVM classifier might actually be easier than obtaining the Twitter data that it’s supposed to classify.

Then again, mathematical models have at least some kind of answer, even if the answer is arrived at iteratively. Not so the batch downloading, cleaning, rate-limiting and other antics required of people wanting to use real-world data from a third party.

To be fair, Twitter’s API is really nicely documented. The day that I wasted trying to figure out why the authentication wasn’t working was entirely my fault, and due to a typo, of all things.

But now I really do feel like I’m beginning to learn the pains of machine learning. Rule #1: make sure you have enough data, and that enough of it is good enough.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s