Prerequisites: No prerequisites are needed for the SQL commands and DBMS fundamentals. Basic knowledge of programming in Python would be helpful if you want to run the source code in the course-ending project.
Taught by Stanford-educated, ex-Googlers. This team has decades of practical experience in quant trading, analytics and e-commerce.
Your bodyguard for when data gets too big, this course is strong but friendly, funny yet deep, animated yet thoughtful.
Let’s parse that.
Your bodyguard for when data gets too big: Most business folks (and quite a few engineers) use Excel as a basic tool of decision making and modeling, but when you can't fit the data you'd like into an Excel spreadsheet that you can easily open, its time to move to a database.
The course is strong but friendly: This course will help you move to a database without being intimidated by the new environment. Don't let anyone tell you that any dataset is too large or too complicated for you to understand (and people will try, most likely)
The course is funny yet deep: It goes really deep into the topics that folks often find hard to understand, such as joins, aggregate operators and interfacing with databases from a programming language. But it never takes itself too seriously:-)
The course is very visual : most of the techniques are explained with the help of animations to help you understand better.
This course is practical as well : Queries are explained in excruciating detail, indices are demystified, and potentially career-limiting traps (Drop, Alter) are marked with bright yellow tape markers so you can steer clear.
The course is also quirky. The examples are irreverent. Lots of little touches: repetition, zooming out so we remember the big picture, active learning with plenty of quizzes. There’s also a peppy soundtrack, and art - all shown by studies to improve cognition and recall.
What's Covered:
- SQL In Great Depth
- Database Fundamentals and Just Enough Theory
- Practical Examples - Queries in MySQL and SQLite, and code in Python
Who is the target audience?
- Yep! Data analysts who would like to really get down and dirty with the data
- Yep! Business folks and executives looking to make their decision making more data-driven, and seeking the technical knowledge to do so.
- Yep! Students of Computer Science and Computer Engineering looking to understand database concepts for the first time
- Yep! Software engineers who need to understand and interface with databases from programming languages in their work
- This course will cover generic (non-system-specific) SQL, but will also conduct exercises using 2 different database technologies: MySQL and SQLite. Installation and use of both these will be explained in-depth
- Explore large datasets and uncover insights - going far beyond the Excel, deep into the data
- Model and create a database for day-to-day use
- Interface with databases from a programming language such as Python
- Have the comfort and confidence needed to load data and use both GUI and a command line interface for database operations
- Fully understand and leverage joins, subqueries, aggregates, indices, triggers, stored procedures and other major database concepts
As the scale of your data grows, file systems (the most famous of which is - Excel!) struggle to keep up. Databases are carefully engineered to do the heavy lifting
MySQL is an open-source RDBMS, the most popular in the world by some measures. Acquired by Oracle, it still has a very powerful free Community Edition
Setting up MySQL and the MySQL workbench can be a little daunting - never fear! We'll walk through it. (The Mac OS X version)
Setting up MySQL and the MySQL workbench can be a little daunting - never fear! We'll walk through it. (The windows version)
Databases are like all computer systems - garbage in, garbage out. To make sure that what goes in makes sense, we need to model real-world entities and the relationships between them.
What's a key? It is a set of defining attributes. Once you have the key, you have captured the essence of an entity, as it were.
We dig deeper into the world of entities and relationships.
Entities could be modeled even with flat files, but relationships can only be modelled in a database.
One-to-one, one-to-many or many-to-many? The nature of the relationships between entities determines how the corresponding data will be represented in a database
We are almost ready to make the leap from modeling data to setting up a database. But first, let's delve a bit deeper into modelling relationships.
All of that E-R model stuff we just learnt is really useful! Let's put it to work immediately, by figuring out how we can translate E-R models into database tables.
Remember that columns in a database have types - these types govern how those columns can be used
NULL is a special value - it implies that a value does not exist. Null is not TRUE or FALSE, its just NULL. Blank strings and zeroes are not null either
Helpful operators - Between, In and Not In will simplify your queries (and your life!)
Dates can be tricky because date handling is so different across database systems. Let's take MySQL as an example, and run through some of the common operations we'd perform with and on dates. Keep in mind that the syntax would be very different for a different DBMS though!
The circle of life of data begins with - creating a database, and creating an empty table within it
Let's understand how a table can be created. In particular, NULLs, primary keys and auto-increment columns are commonly used, and really handy, so let's make sure we understand them
Referential Integrity (aka Foreign Key Constraints) are a really important concept in DBMS.
Let's get our feet wet - create a database, use that database, create a simple table, and bulk load a file into that table
That first table was a bit simplistic - no constraints. Now let's do a more involved example, and harness the full power of the Bulk Uploader.
SUM, MAX, MIN, COUNT and AVG are aggregate operators - by definition they operate over a group of rows, rather than a single row
We discussed how aggregation operators need a range of queries to function on. What can that range be? It could be the entire table, but even more likely its some group of the rows in a table, defined by the GROUP-BY operator
Let's keep going with the GROUP-BY, and understand how it divvies up the data in a database
We can order the results of a query by one or more columns using the ORDER BY. Remember that relations are technically bags (i.e. multisets) which do not possess order - but this is a convenience taht DBMS make available
Having is an operator that filters out groups based on a condition. Its like the WHERE clause but it operates on groups rather than individual rows
Use LIMIT to return only a specific number of rows from a query. Use this to peek into a large table without retrieving a gazillion rows
Count and count distinct are handy to find the number of rows, and the number of unique rows in a query result
The full power of databases emerges when we link tables - and Joins are the way to accomplish this
Cross Joins are conceptually simple, which is great, because they are the underpinning of Inner Joins
Inner Joins are your best friend. Understand them for what they are: cross joins with a filter condition.
Outer Joins are really useful if used right. They are a little tricky though - understand how they work, and why you should not be surprised to see NULLs in the result of an outer join.
Once we've understand Inner and Outer joins, Natural Joins are easy-peasey
What's a subquery? Its a query inside another. Outer and inner queries used together are very powerful.
The Set operations are easily extended to SQL, just remember that a relation is a bag, not a set (what's the difference? bags can contain duplicates, sets can't!)
Inserting rows into a table 1 row at a time is painfully slow - never fear! You can run a query and directly pipe its output into a table
Let's create a new table, this time using the Inner Join operator to seed it.
Let's create a table twice - once using Inner Join, and once using Outer Join. Guess what the difference is?
Any column can hold a NULL value, unless you specify a NOT NULL constraint. If you do, also use a default value if possible
An Index is a quick way to query specific columns of a database. Indices make lookup very fast, but they slow down updates and deletes, so be sure to really understand them.
Primary keys are always indexed for fast lookup.
If for some reason you are unable to assign a primary key, at least have a foreign key
Updates and Deletes have a complicated interplay with the Foreign Key constraint. Understand on-cascade-delete and its cousins
Check constraints are simple, and incredibly useful - and somehow underused. Use them!
Stored Procedures are to SQL what functions are to code. Learn how to define and call stored procedures
A transaction is a logical unit of work - the DBMS will ensure that each transaction satisfies 4 ACID properties: Atomicity, Consistency, Isolation and Durablility
The circle of life of data in a database does not stop with queries: Everything must change, including data
Be really careful when you use the Alter and Drop commands. Incredibly powerful, and very simple - sometimes too simple, because that makes them so easy to use!
Views can be thought of as virtual tables. Temporary tables are exactly what their name would suggest. Use them often!
Designing good relational schemas starts off by figuring out the real world problem you want to map. Design each of your tables well, consider each column and what constraints it should and should not have. Remember choosing a primary key well is super important!
You know what you want to model, how do you figure how many tables you choose to store this information? Here are a few rules of thumb.
Normal forms are very inaccessible when you read them in theory, however they are great rules to get well-designed databased. Let's see what they mean in plain English-)
How do programming languages interface with databases? Also, a step by step guide to building your own database of stock price movements over the last 2 years
SQLite is available out of the box with Python, and is a handy and quick way to start working with databases with no setup or installation.
Code along as we build a database of stock movements. We'll download and unzip files with stock movements from the NSE website, insert the data into a database. We'll accept a ticker from a user and generate an excel sheet with a chart of its price movements for the last year.
Code along as we build a database of stock movements. We'll download and unzip files with stock movements from the NSE website, insert the data into a database. We'll accept a ticker from a user and generate an excel sheet with a chart of its price movements for the last year.
If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares.