Data Spring

A path to towards fluency in the world of data.

1. Fundamentals – (B) Hash Functions, Binary Tree, O(n)

If you haven’t taken a CS data structures class but are familiar with programming using languages like C/C++/Java, or even Python/Ruby, you might be able to catch up by going through these tutorials:


1. Fundamentals – (A) Matrices and Linear Algebra

The Khan Academy has a basic, high-school-level, interactive matrices tutorial, but I would recommend their college-level linear algebra tutorials:

  1. Vectors and Spaces
  2. Matrix Transformations
  3. Alternate Bases

It is especially useful to be familiar with eigenvalues, eigenvectors, and eigenvectors.

A Data Science Tutorial

Welcome! My goal is to provide a data science primer loosely based on an incredible “Metro” chart of data science learning paths.

  • “Metro” Lines

  1. Fundamentals
  2. Statistics
  3. Programming
  4. Machine Learning
  5. Text Mining / Natural Language Processing
  6. Data Visualization
  7. Big Data
  8. Data Ingestion
  9. Data Munging
  10. Toolbox

Perhaps this is a bit ambitious, but I want to attempt to follow the golden example of Michael Hartl and his excellent Ruby on Rails tutorial and try to produce a comprehensive data science tutorial. The general idea is that we will generally be working with a single large data set and producing useful analyses and results based on lessons in each of the ten chapters outlined above.

We will assume a basic working knowledge of at least one programming or scripting language (for example, C/C++, Java, MATLAB) and will try to stick to conventional paradigms as much as possible. Also, in following the Metro guide’s example, we will rely on free tools as much as possible.

Please feel free to leave any feedback or suggestions.