I was reading an article one day on Hacker News about how someone figured out how to mine data from Facebook's "Active Now" function. They used it for the purpose of tracking friend's sleep habits. Which is a pretty cool albeit mildly creepy use of public data to infer private things. I thought the concept of mining Active Now was pretty cool, so I modified and setup the script on my server to let it collect data for a week. Instead of using the inbuilt scripts to track sleep though, I imported the raw database into excel, played around a bit and made some sweet graphs.
Now the thing is, I'm not a mathematician or statistician and barely passed math in high school. So I'm not amazing when it comes to arranging and displaying data. I'm just a fan of cool graphs. The way I chose to add up the data was to basically just count how many "hits" the script registered within a time period, then average them throughout the week. EG 2PM had around 1000 "hits" whereas 3AM had like 300. So the graphs do pretty well in showing the relative times people were online. Which I thought was good enough, I never really expected many people to actually look at something I made. Honestly I'm surprised you're reading this.
Anyway, I posted it on r/DataIsBeautiful at night and woke up to about 300 upboats. I thought cool, people liked a thing I did. Then I looked at the comments... For lack of a better word, I got absolutely shat on. Commenters really weren't happy about the bullshit Y-axis. Even though the pictures on imgur ended up getting viewed over 151,000 times, the reddit post was only 66% upvoted. Ow. For reference the most viewed thing I'd made before that was the Shia wallpaper at about 1000 views. I learned the hard way r/DataIsBeautiful is a super critical subreddit. Which is fair enough, honestly.
I was determined to set things right and upload a fixed version of the post with new and corrected stats. Though it turns out there was a fundamental flaw with the script which doesn't allow it to get an accurate reading on how many people are online at one time. Again, I could do it relatively (or just go with the bullshit inaccurate data) but it didn't feel right. Maybe one day I'll find another way to do this better/properly but that might not be for a while.