# Making Predictions with Data and Python

How do you predict data in Python? This course will give you an understanding of the most important theoretical concepts that are essential when building predictive models for real-world problems.

- Self-paced with Life Time Access
- Certificate on Completion
- Access on Android and iOS App

**Build Awesome Predictive Models with Python**

Python has become one of any data scientist's favorite tools for doing Predictive Analytics. In this hands-on course, you will learn how to build predictive models with Python.

During the course, we will talk about the most important theoretical concepts that are essential when building predictive models for real-world problems. The main tool used in this course is scikit -learn, which is recognized as a great tool: it has a great variety of models, many useful routines, and a consistent interface that makes it easy to use. All the topics are taught using practical examples and throughout the course, we build many models using real-world datasets.

By the end of this course, you will learn the various techniques in making predictions about bankruptcy and identifying spam text messages and then use our knowledge to create a credit card using a linear model for classification along with logistic regression.

**About the Author**

**Alvaro Fuentes**is a Data Scientist with an M.S. in Quantitative Economics and a M.S. in Applied Mathematics with more than 10 years' experience in analytical roles.- He worked in the Central Bank of Guatemala as an Economic Analyst, building models for economic and financial data. He founded Quant Company to provide consulting and training services in Data Science topics and has been a consultant for many projects in fields such as: Business, Education, Psychology, and Mass Media. He also has taught many (online and on-site) courses to students from around the world in topics such as Data Science, Mathematics, Statistics, R programming, and Python. Alvaro Fuentes is a big Python fan, has been working with it for about 4 years, and uses it routinely for analyzing data and producing predictions. He has also used it in a couple of software projects. He is also a big R fan, and doesn't like the controversy inherent in any attempt to evaluate which is the bestâ€”R or Python; he uses them both. He is also very interested in the Spark approach to big data, and likes the way it simplifies complicated things. He is neither a software engineer nor a developer but is generally interested in web technologies. He also has technical skills in R programming, Spark, SQL (PostgreSQL), MS Excel, machine learning, statistical analysis, econometrics, and mathematical modeling. Predictive Analytics is a topic in which he has both professional and teaching experience. He has solved practical problems in his consulting practice using the Python tools for predictive analytics; the topics of predictive analytics are part of a more general course on Data Science with Python that he teaches online.

- Knowledge of the Python programming language is assumed.
- Basic familiarity with Python's Data Science Stack would be useful, although a brief review is given.
- Familiarity with basic mathematics and statistical concepts is also advantageous to take full advantage of this course.

- Understand the main concepts and principles of Predictive Analytics and how to use them when building real-world predictive models.
- Properly use scikit-learn, the main Python library for Predictive Analytics and Machine Learning.
- Learn the types of Predictive Analytics problem and how to apply the main models and algorithms to solve real world problems.
- Build, evaluate, and interpret classification and regression models on real-world datasets.
- Understand Regression and Classification
- Refresh your visualization skills

Explain what the Anaconda Distribution is and why we are using it in this course. Also to show how to get and install the software.

- Explain what is the Anaconda Distribution and the problem it solves
- Go to the website to get Anaconda
- Show where to find the installer and ask the user to install it

Introduce the computing environment in which we will work for the rest of the course.

- Explain what is the Jupyter Notebook
- How to start the Jupyter Notebook from the command line
- Take a tour to see the interphase of Jupyter and show how to edit code and markdown cells

Explain what is NumPy, the problem it solves and why is important for Pythonâ€™s Data Stack. Also show some of the most common ways to create ndarrays and how to operate with them.

- Explain what is NumPy
- Show what is a vectorized operation using simple examples
- Show some of the most common ways to create ndarrays and how to perform mathematical functions on them

Explain what Pandas is and what we can do with it. Talk about the main objects in this library, that is, Series and DataFrames.

- Explain what is pandas and what it is used for
- Explain what is Series and a DataFrame
- Provide practical examples of creation and manipulation of Pandas objects

Explain to the viewer what is matplotlib and what are the main concepts used when working with this library.

- Explain what is the matplotlib library and its use
- Explain the main terms used when working with matplotlib
- Provide simple examples of visualizations produced with matplotlib

Show some of the visualization capabilities included in pandas objects and how we can modify some elements of a pandas plot with matplotlib.

- Show how to produce some common visualizations using the methods from pandas objects
- Show to modify elements of a pandas plot with matplotlib
- Give a list of the plots that can be produced with pandas

Introduce the Seaborn library and show some of the specialized and complex statistical visualizations that can be produced with this library.

- Explain what is Seaborn
- Show some examples of commonly used plots produced with Seaborn
- Show examples of complex plots produced with Seaborn

Explain to the viewer the definition of term Predictive Analytics and how it is different from other forms of making predictions.

- Explain what is a prediction in the context of the field of Predictive Analytics
- Explain the role of data in Predictive Analytics
- Give a concise and precise definition of the term Predictive Analytics

Since Predictive Analytics is the used of Data combined with quantitative tools it is possible to distinguish between three approaches for doing Predictive Analytics: mathematical, statistical and machine learning Models, in this video we explain the difference between them.

- Define and explain what is a mathematical model, in the context of Predictive Analytics
- Define and explain what is a statistical model, in the context of Predictive Analytics
- Define and explain what is a machine learning model, in the context of Predictive Analytics

Explain the main categories of Machine Learning: supervised and unsupervised learning. Briefly mention reinforcement learning.

- Define Supervised Learning and provide some examples
- Define Unsupervised Learning and provide some examples
- Define Reinforcement Learning and provide some examples

Explain the distinction between the two types of problems that can be found in Supervised Learning, that is, Regression and Classification.

- Mention the elements of a Supervised Learning problem
- Define what is a Regression problem
- Define what is a Classification problem

Provide a clear definition for the terms model and algorithm and their relation with the term learning model. Also give the 3 conditions we must check before using Machine Learning for doing Predictive Analytics.

- Define the term learning model in the context of Supervised Learning
- Provide simple examples of models, learning algorithms and learning models
- Give the 3 conditions we must check before using Machine Learning for doing Predictive Analytics

Present the scikit-learn library and make a demonstration of how to use it to build a predictive model.

- Present the scikit-learn library as part of the python data science stack
- Show high-level steps used to build a predictive model in scikit-learn
- Load a dataset and build a predictive model

Present to the viewer the Multiple Regression Model and explain at a high level the general formulation of the model and the scikit-learn class that is used to build these types of models.

- Present the general formulation of the Multiple Regression Model
- Present a concrete example of a multiple regression model and explain the interpretation of the coefficients
- Present the LinearRegression class from scikit-learn

Explain the principle behind the KNN model for regression; present the general steps of the algorithm using a simplified example. Introduce the class used in scikit-learn to produce these models.

- Explain the underlying principle of the KNN model
- Present the steps of the KNN algorithm
- Introduce the KNearestNeighbor class from scikit-learn

Explain the construction of the lasso regression and compare it to the multiple regression model, show the formulation of the model and the modification to the optimization objective. Introduce the class used in scikit-learn to produce these models.

- Explain what the Lasso model does
- Present the general formulation of the model
- Introduce the LASSO class from scikit-learn

In this video we show how to evaluate regression models, give a short list of the metrics and explain the MSE. Then we explain the intuition behind the concepts of cross-validation, overfitting and regularization.

- Give a list of the different metrics for regression models. Explain the MSE metric
- Explain what is cross-validation and why is needed
- Explain the concepts of overfitting and regularization

Demonstrate how to build, evaluate and compare different predictive models for predicting diamond prices and use the best model to make predictions.

- Introduce, load and prepare data for modeling
- Show how to build different regression models
- Show how to evaluate models and use the best to make predictions

Demonstrate how to build, evaluate and compare different predictive models for predicting crime in United States communities and use the best model to make predictions.

- Introduce, load and prepare data for modeling
- Show how to build different regression models
- Show how to evaluate models and use the best to make predictions

Demonstrate how to build, evaluate and compare different predictive models for predicting post popularity and use the best model to make predictions. Also talk about some of the common challenges found when building predictive models.

- Introduce, load and prepare data for modeling
- Show how to build different regression models. Show how to evaluate models and use the best to make predictions
- Briefly mention some of the challenges one may find when building models

Mention the types of classification tasks. Then talk intuitively about the Logistic Regression model. Also mention some methods of the the LogisticRegression object from scikit-learn.

- Explain the different types of classification problems.
- Explain intuitively about the general ideas behind the Logistic Regression model.
- Talk briefly about the LogisticRegression object from scikit-learn.

Provide an intuitive understanding of how classification trees work, how to interpret these models and how they come up with the decision rules.

- Explain the intuitively the idea of classification trees
- Provide an example of a final classification tree
- Briefly explain how the classification tree produce the rules

Explain at a very high level where the NaÄ¼ve Bayes models come from and give some of the general characteristics of these models. Talk about the two types of NaÄ¼ve Bayes that can be used in scikit-learn.

- Explain the general idea upon which these models are built
- Mention the two types of NaÄ¼ve Bayes found in scikit-learn
- Talk briefly about the objects used in scikit-learn for training these models

Explain the different kinds of evaluation metrics for classification models. Explain the confusion matrix and the main metrics derived from it: accuracy, precision and recall.

- Explain the types of evaluation metrics for classification models
- Define and explain the confusion matrix
- Explain the main metrics for model evaluation: accuracy, precision and recall

Demonstrate how to build, evaluate and compare different classification models for predicting credit card default and use the best model to make predictions.

- Introduce, load and prepare data for modeling
- Show how to build different classification models
- Show how to evaluate models and use the best to make predictions

Demonstrate how to build, evaluate and compare different classification models for predicting bankruptcy for European companies and use the best model to make predictions.

- Introduce, load and prepare data for modeling
- Show how to build different classification models
- Show how to evaluate models and use the best to make predictions

Demonstrate how to build a spam classifier using the Bag of Words model and the NaÄ¼ve Bayes model. Use the model to predict the class of actual text messages.

- Introduce, load, and prepare data for modeling
- Explain the Bag of Words model and how to use it to get features from text data
- Show how to build a spam classifier and use it to classify actual text messages

Mention briefly some Predictive Analytics that were not addressed in the course, namely, ensemble methods, working with features, hyper-parameter tuning, neural networks, and deep learning.

- Mention some of the subjects closely related with Predictive Analytics
- Talk briefly about ensemble methods, working with features, and hyper-parameter tuning
- Talk briefly about neural networks and deep learning and end of the course