Probability and Statistics for Machine Learning and Data Science

Published:

This is a collection of my projects and lecture notes, designed to enhance learning and solidify my understanding of essential probability and statistics concepts crucial for mastering Machine Learning and Data Science.


Probability Theory

In this series of lecture notes, I will explain the fundamental concepts of the following topics:

  • Overview of probability and counting.
  • Condition probability.
  • Random variables.
  • Joint probability distribution.

For this series of lecture notes, the following textbooks are used to complement this lecture note.

  • Introduction to Probability (2nd Edition) by Joseph K. Blitzstein and Jessica Hwang .
  • Applied Statistics and Probability for Engineers (7th Edition) by Douglas C. Montgomery and George C. Runger.

Statistical Inference

In this series of lecture notes, I will explain the fundamental concepts of the following topics:

  • Confidence interval
  • Hypothesis Test
  • Non-parametric Test

For this series of lecture notes, the following textbooks are used to complement this lecture note.

  • Applied Statistics and Probability for Engineers (7th Edition) by Douglas C. Montgomery and George C. Runger.
  • Statistics - The Art and Science of Learning (5th Edition) from Data by Alan Agresti.

Regression Analysis

In this series of lecture notes, I will explain the concepts of the following topics:

  • Simple and multiple linear regression models.
  • Logistic regression models.

For this series of lecture notes and projects, the following textbooks are used to complement this lecture note.

  • Introduction to Linear Regression Analysis (5th Edition) by Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining.
  • A Second Course in Statistics: Regression Analysis (8th Edition) by William Mendenhall, and Terry Sincich.

Linear Regression

Linear regression is a fundamental algorithm in data science and machine learning. It is utilized for predicting continuous variables using one or more predictor variables.

In this project, I will:

  • Explain the fundamental concepts behind linear regression;
  • Build a simple linear regression (SLR) model using Python from scratch, and implement functions enabling us to evaluate our model.
  • Build a multiple linear regression (MLR) model using Python from scratch, and implement functions enabling us to evaluate our model.

In this project, I will showcase how a thorough explanatory regression analysis is conducted using a sample dataset.