Big Data/Analytics Zone is brought to you in partnership with:

Justin Bozonier is the Product Optimization Specialist at GrubHub formerly Sr. Developer/Analyst at Cheezburger. He's engineered a large, scalable analytics system, worked on actuarial modeling software. As Product Optimization Specialist he is currently leading split test design, implementation, and analysis. The opinions expressed here represent my own and not those of my employer. Justin is a DZone MVB and is not an employee of DZone and has posted 27 posts at DZone. You can read more from them at their website. View Full User Profile

Getting Started with Data

10.01.2013
| 3357 views |
  • submit to reddit

A regular question I get asked is “What materials would you recommend for someone just getting started in a more data oriented job?” In this blog post I’m going to try to give a set of options, both books and websites, that will answer that question.

Where Am I At?

I currently work as a conversion optimization specialist. What that means is I design/run/analyze feature experiments on web sites. The ultimate goal is usually centered around driving more, or larger, purchases. In working with other analysts, I’ve noticed a set of core skills that, when all are present, make the analyst one of my go-tos and that have led me to some success at what I do.

Without Further Ado

  • Website: Github

    • You’re gonna need to learn to code.
    • Search for data, or statistics, or anything, and I bet you find sample code.
  • Book: Think Stats

    • Learn a little Python, learn a little stats. A great primer on using the two together.
    • Might want to cover the statistics reading I’ve outlined first.
  • Book: Head First Data Analysis

    • Descriptive statistics
    • Basic linear regression
    • Establishing a “gut” for data
  • Book: Statistics in Plain English

    • Descriptive statistics
    • Statistical tests (Binomial and t-tests at least)
    • Confidence intervals
    • Linear regression
  • Web Article: How Not to Run an A/B Test

    • Experiment design
    • Dipping your toes into power analysis
  • Book: The Flaw of Averages * Why the most prevalent descriptive statistic, the average, can be a terribly misleading golden hammer in search of a nail.

  • Free Online Class: Probability & Statistics, Carnegie Mellon

    • Probability (including Bayes theorem)
    • Statistics
    • Exploratory data analysis
  • Textbook: A Second Course in Statistics: Regression Analysis (7th Edition)

    " ]regression="" analysis<="" a="">
    • In depth treatment on linear regression.
    • Tons of theory, but focused on learning to use statistical software to do the analysis.
    • Best read after either taking a stats 101 class or learning more about classic statistical tests and how to use them correctly.
    " ]regression="" analysis<="" a="">
  • Technology: R Studio * I have read several books on R, but none of them really helped me much. The best thing has been this program, as it’s made it simple to get data into R and viewable so I can focus on analyzing it.

Currently Reading

This is a selection of books that I’m currently reading and learning from, but may or may not have gotten any results from yet.

  • Textbook: Statistics: A Bayesian Perspective

    • A very approachable introduction to Bayesian Stats. It is exceedingly less dense than some of the other material in this list.
    • Also starts to get into multiply probability distributions, and is very helpful in visualizing them.
  • Textbook: Doing Bayesian Data Analysis

    • Once you’re done with the previous book, this one builds on it quite nicely.
    • Gets down to actually computing more advanced Bayesian statistics problems, including hypothesis tests.
    • Also fairly approachable, but I wouldn’t recommend it to an absolute novice.
  • Textbook: Introduction to Bayesian Statistics

    • Very dense. Currently making my way through this bit by bit.
    • Nice to use after hitting the first book, and great in parallel with “Doing Bayesian Data Analysis."
    • More mathematically focused, but still starts from the basics, so it’s definitely a book that you can use to slowly build up to a more rigorous exploration of Bayesian Statistics.

(Note: This article and the opinions expressed are solely my own and do not represent those of my employer.)

Published at DZone with permission of Justin Bozonier, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)