Machine Learning and Data Science for programming beginners using python with scikit-learn, SciPy, Matplotlib & Pandas.

Hi.. Hello and welcome to my new course, Machine Learning with Python for Dummies. We will discuss about the overview of the course and the contents included in this course.

Artificial Intelligence, Machine Learning and Deep Learning Neural Networks are the most used terms now a days in the technology world. Its also the most mis-understood and confused terms too.

Artificial Intelligence is a broad spectrum of science which tries to make machines intelligent like humans. Machine Learning and Neural Networks are two subsets that comes under this vast machine learning platform.

Lets check what's machine learning now. Just like we human babies, we were actually in our learning phase then. We learned how to crawl, stand, walk, then speak words, then make simple sentences.. We learned from our experiences. We had many trials and errors before we learned how to walk and talk. The best trials for walking and talking which gave positive results were kept in our memory and made use later. This process is highly compared to a Machine Learning Mechanism.

Then we grew young and started thinking logically about many things, had emotional feelings, etc. We kept on thinking and found solutions to problems in our daily life. That's what the Deep Learning Neural Network Scientists are trying to achieve. A thinking machine.

But in this course we are focusing mainly in Machine Learning. Throughout this course, we are preparing our machine to make it ready for a prediction test. Its Just like how you prepare for your Mathematics Test in school or college. We learn and train ourselves by solving the most possible number of similar mathematical problems. Lets call these sample data of similar problems and their solutions as the 'Training Input' and 'Training Output' Respectively. And then the day comes when we have the actual test. We will be given new set of problems to solve, but very similar to the problems we learned, and based on the previous practice and learning experiences, we have to solve them. We can call those problems as 'Testing Input' and our answers as 'Predicted Output'. Later, our professor will evaluate these answers and compare it with its actual answers, we call the actual answers as 'Test Output'. Then a mark will be given on basis of the correct answers. We call this mark as our 'Accuracy'. The life of a machine learning engineer and a data-scientist is dedicated to make this accuracy as good as possible through different techniques and evaluation measures.

Here are the major topics that are included in this course. We are using Python as our programming language. Python is a great tool for the development of programs which perform data analysis and prediction. It has tons of classes and features which perform the complex mathematical analysis and give solutions in simple one or two lines of code so that we don't have to be a statistic genius or mathematical Nerd to learn data science and machine learning. Python really makes things easy.

These are the main topics that are included in our course:

- System and Environment preparation
- Installing Python and Required Libraries (Anaconda)
- Basics of python and sci-py
- Python, Numpy , Matplotlib and Pandas Quick Courses
- Load data set from csv / url
- Load CSV data with Python, NumPY and Pandas
- Summarize data with description
- Peeking data, Data Dimensions, Data Types, Statistics, Class Distribution, Attribute Correlations, Univariate Skew
- Summarize data with visualization
- Univariate, Multivariate Plots
- Prepare data
- Data Transforms, Rescaling, Standardizing, Normalizing and Binarization
- Feature selection â€“ Automatic selection techniques
- Univariate Selection, Recursive Feature Elimination, Principle Component Analysis and Feature Importance
- Machine Learning Algorithm Evaluation
- Train and Test Sets, K-fold Cross Validation, Leave One Out Cross Validation, Repeated Random Test-Train Splits
- Algorithm Evaluation Metrics
- Classification Metrics - Classification Accuracy, Logarithmic Loss, Area Under ROC Curve, Confusion Matrix, Classification Report
- Regression Metrics - Mean Absolute Error, Mean Squared Error, R 2
- Spot-Checking Classification Algorithms
- Linear Algorithms - Logistic Regression, Linear Discriminant Analysis
- Non-Linear Algorithms - k-Nearest Neighbours, Naive Bayes, Classification and Regression Trees, Support Vector Machines
- Spot-Checking Regression Algorithms
- Linear Algorithms - Linear Regression, Ridge Regression, LASSO Linear Regression and Elastic Net Regression
- Non-Linear Algorithms - k-Nearest Neighbours, Classification and Regression Trees, Support Vector Machines
- Choose The Best Machine Learning Model
- Compare Logistic Regression, Linear Discriminant Analysis, k-Nearest Neighbours, Classification and Regression Trees, Naive Bayes, Support Vector Machines
- Automate and Combine Workflows with Pipeline
- Data Preparation and Modelling Pipeline
- Feature Extraction and Modelling Pipeline
- Performance Improvement with Ensembles
- Voting Ensemble
- Bagging: Bagged Decision Trees, Random Forest, Extra Trees
- Boosting: AdaBoost, Gradient Boosting
- Performance Improvement with Algorithm Parameter Tuning
- Grid Search Parameter
- Random Search Parameter Tuning
- Save and Load (serialize and deserialize) Machine Learning Models
- Using pickle
- Using Joblib
- Finalize a machine learning project
- Steps For Finalizing classification models - pima indian dataset
- Dealing with imbalanced class problem
- Steps For Finalizing multi class models - iris flower dataset
- Steps For Finalizing regression models - boston housing dataset
- Predictions and Case Studies
- Case study 1: predictions using the Pima Indian Diabetes Dataset
- Case study: Iris Flower Multi Class Dataset
- Case study 2: the Boston Housing cost Dataset

Machine Learning and Data Science is the most lucrative job in the technology arena now a days. Learning this course will make you equipped to compete in this area.

Best wishes with your learning. Se you soon in the class room.

- A medium configuration computer and the willingness to indulge in the world of Machine Learning

- Machine Learning and Data Science for programming beginners using python with scikit-learn, SciPy, Matplotlib & Pandas

Introduction to Machine Learning - Part 1 - Concepts , Definitions and Types

Introduction to Machine Learning - Part 2 - Classifications and Applications

IMPORTANT: The latest version of Anaconda features Python 3.8 which is not yet supported by Tensorflow and Keras. Tensorflow community is currently working on this. Let's use the previous version of anaconda from this link until a compatible version is released

So I strongly recommend downloading the Python 3.7 Anaconda for this course https://repo.anaconda.com/archive/Anaconda3-2020.02-Windows-x86_64.exe

IMPORTANT: The latest version of Anaconda features Python 3.8 which is not yet supported by Tensorflow and Keras. Tensorflow community is currently working on this. Let's use the previous version of anaconda from this link until a compatible version is released

So I strongly recommend downloading the Python 3.7 Anaconda for this course https://repo.anaconda.com/archive/Anaconda3-2020.02-Windows-x86_64.exe

Load and Read CSV data file using Python Standard Library

Dataset Summary - Class Distribution and Data Summary

Dataset Summary - Explaining Skewness - Gaussian and Normal Curve

Multivariate Dataset Visualization - Correlation Plots

Multivariate Dataset Visualization - Scatter Plots

Feature Selection - Uni-variate Part 1 - Chi-Squared Test

Feature Selection - Uni-variate Part 2 - Chi-Squared Test

Feature Selection - Principal Component Analysis (PCA)

Refresher Session - The Mechanism of Re-sampling, Training and Testing

Algorithm Evaluation Techniques - Train and Test Set

Algorithm Evaluation Techniques - K-Fold Cross Validation

Algorithm Evaluation Techniques - Leave One Out Cross Validation

Algorithm Evaluation Techniques - Repeated Random Test-Train Splits

Algorithm Evaluation Metrics - Classification Accuracy

Algorithm Evaluation Metrics - Area Under ROC Curve

Algorithm Evaluation Metrics - Classification Report

Algorithm Evaluation Metrics - Mean Absolute Error - Dataset Introduction

Algorithm Evaluation Metrics - Mean Absolute Error

Classification Algorithm Spot Check - Logistic Regression

Classification Algorithm Spot Check - Linear Discriminant Analysis

Classification Algorithm Spot Check - K-Nearest Neighbors

Classification Algorithm Spot Check - Support Vector Machines

Regression Algorithm Spot Check - Linear Regression

Regression Algorithm Spot Check - Ridge Regression

Regression Algorithm Spot Check - LASSO Linear Regression

Regression Algorithm Spot Check - Elastic Net Regression

Regression Algorithm Spot Check - K-Nearest Neighbors

Regression Algorithm Spot Check - Support Vector Machines (SVM)

Compare Algorithms - Part 1 : Choosing the best Machine Learning Model

Compare Algorithms - Part 2 : Choosing the best Machine Learning Model

Performance Improvement: Parameter Tuning using Grid Search

Performance Improvement: Parameter Tuning using Random Search

Export, Save and Load Machine Learning Models : Pickle

Export, Save and Load Machine Learning Models : Joblib

Finalizing a Classification Model - The Pima Indian Diabetes Dataset

Quick Session: Imbalanced Data Set - Issue Overview and Steps

Finalizing a Regression Model - The Boston Housing Price Dataset

Real-time Predictions: Using the Pima Indian Diabetes Classification Model

Real-time Predictions: Using Iris Flowers Multi-Class Classification Dataset

Real-time Predictions: Using the Boston Housing Regression Model

Full Source Code Attached as zip file.

Please download it from the resources link