Awesome Material

📚 You can learn a lot for free on the Internet. This page puts together resources on data, data science and related fields which I find absolutely brilliant. I also list some awesome books (I will specify if they are freely available).

This list is continuously updated.

📊 Statistics, Probability and the science of Data

  • C Bergstrom, J West, Calling Bullshit in the age of Big Data, course about the manipulative use of data and the wrong use of statistics. The authors have also published a book.
  • [book] D Huff, How to lie with Statistics (1954), nice little book on the common mistakes and misunderstandings aroud data. Old but very valuable and entertaining. Note: some of the examples used can be perceived as sexist and out of place today, so keep in mind to contextualise to the 1950s.
  • T Vigen, Spurious Correlations, visually displays correlations between completely unrelated variables, to illustrate the old adage that correlation is not causation. A favourite within the data community.
  • [book] A B Downey, Think Stats (O’Reilly 2011), freely available online.
  • Seeing Theory, a visual introduction to probability and statistics, a site built by students at Brown University.
  • W Chen, Probability Cheatsheet.
  • The Scipy Lecture Notes, brilliant and obviously focussed on Python, but useful for general concepts too.
  • [book] W Mckinney, Python for Data Analysis (O’Reilly 2022), freely available online.

🤖 Machine Learning - general material

  • S Yee, T Chu, R2D3, visual intro to Machine Learning.
  • V Powell, L Lehe, Explained Visually, another visual site.
  • [book] C Molnar, Interpretable Machine Learning (Leanpub 2019), freely available.
  • [book] T Hastie, R Tibshirani, J Friedman, The Elements of Statistical Learning (Springer 2001), freely available.
  • [book], G James, D Witten, T Hastie, R Tibshirani, Introduction to Statistical Learning (Springer 2013), a more high-level book by some of the same authors of the above. Again, freely available - it exists in versions with R and Python code examples (the latter adds J Taylor as author).
  • The scikit-learn docs have tutorials and extensive explanations for every supported algorithm as well as general notes on Machine Learning concepts.
  • MLU-Explain, a website by Amazon that also presents ML concepts in a visual way

🧠 Neural Networks & Deep Learning

👀 Computer Vision

💻 Coding and Computer Science

  • Sorting Algorithms, Toptal.
  • [book] Gayle Laakmann McDowell, Cracking the Coding Interview (CareerCup 2008), general resource not just to prepare for interviews but for general challenges.

🐍 Python

These resources are general to Python, regardless of its use for data science.