Posts

Showing posts from September, 2019

Jumping (into) the Turnstile (Data)

Image
This blogpost marks the completion of my first week at data science bootcamp. In our first week, we were assigned a group project to use NYC MTA turnstile data to approximate the best times of day and locations for an engagement street team to be placed. The approach to the project was very open-ended, which meant I made a lot of useful mistakes. EDUCATIONAL MISTAKE #1 - I tried to transform data before really looking at it When I first started the project, I downloaded the data and immediately started trying to do operations on it. I knew that cleaning data and exploring it beforehand are important - but surely the MTA cleaned their data and made it ready for easy public use before publishing it? I spent four hours figuring out how to filter data, making subsets, aggregating, and converting datatypes before I ever did a single .describe( ). The second I did, I saw that column names had whitespace, counts were randomly negative every once in a while, and there were a t...