A Data Science Tutorial

Welcome! My goal is to provide a data science primer loosely based on an incredible “Metro” chart of data science learning paths.

  • “Metro” Lines

  1. Fundamentals
  2. Statistics
  3. Programming
  4. Machine Learning
  5. Text Mining / Natural Language Processing
  6. Data Visualization
  7. Big Data
  8. Data Ingestion
  9. Data Munging
  10. Toolbox

Perhaps this is a bit ambitious, but I want to attempt to follow the golden example of Michael Hartl and his excellent Ruby on Rails tutorial and try to produce a comprehensive data science tutorial. The general idea is that we will generally be working with a single large data set and producing useful analyses and results based on lessons in each of the ten chapters outlined above.

We will assume a basic working knowledge of at least one programming or scripting language (for example, C/C++, Java, MATLAB) and will try to stick to conventional paradigms as much as possible. Also, in following the Metro guide’s example, we will rely on free tools as much as possible.

Please feel free to leave any feedback or suggestions.