Databases are the containers/repository of data. They hold the data in the form of a spreadsheet(not the same but for the sake of imagination). There are many databases viz. Relational and Non-Relational. Let us not dwell on the latter part but stick with the relational databases. Watch this video to know more about databases.
This post will focus primarily on writing SQL queries. SQL queries are common across all the databases but there are some flavors and it that may vary according to different databases. There are various vendors like IBM, MySQL, PostgreSQL, Oracle, etc. Every vendor has its style of SQL to communicate with its database. The one I would suggest is PostgreSQL which is open source and has gotten wider adoption.
The primary way to communicate with a relational database is through SQL. Structured Query Language (SQL) is a language to communicate with the database. Without further ado, let us jump into learning SQL.
Image Source: The Data Labs
A table looks like the above spreadsheet with rows and columns. Columns are the headers for the category of the values that we input. A column that is defined as a name will have only names in it and a column defined as a currency would certainly have numbers in the entire column. So, a column defines what kind of values it should comprise. Rows are the list of entries made in different columns. Rows define the number of entries in a spreadsheet.
Now, to get all the columns from the customer table, we use the keyword SELECT and specify the column names that we are interested in. We can use the * symbol to fetch all the columns from the table. To connect to a table, the keyword FROM is used which comes after the SELECT keyword.
SELECT * FROM customer;
To get specific columns from the customer table.
SELECT first_name, last_name FROM customer;
In spreadsheets, we would have used filters similar to that in SQL there are filters to query. For instance, if you want to look only for emails that have the first name as ‘Sai’, the keyword that should be used is WHERE. WHERE should be used after the FROM command.
SELECT email FROM customer WHERE first_name='Sai';
To get all the emails of the customer whose first name is ‘Sai’ and the last name is ‘Kumar’, we use a keyword called AND which is a logical operator similar to OR, and NOT. You can learn more about logical operators in SQL.
SELECT email FROM customer WHERE first_name='Sai' AND last_name='Kumar';
To get all the emails of the customer whose first name is ‘Sai’ or the last name is ‘Kumar’.
SELECT email FROM customer WHERE first_name='Sai' OR last_name='Kumar';
To arrange the data in ascending or descending order use the ORDER BY keyword followed by the column name based on which the entire table would be ordered. To arrange is ascending use ASC and for descending DESC. By default, if you don’t specify the order, it will sort the column in ascending order. ORDER BY always comes at the tail end of the query.
To get the list of customer ids and arrange them in ascending order of their transaction date.
SELECT customer_id FROM customer ORDER BY transaction_date ASC;
To get the list of customer ids and arrange them in descending order of their transaction date.
SELECT customer_id FROM customer ORDER BY transaction_date DESC;
We can also limit the number of records displayed by using LIMIT with a positive integer number to specify the limit.
To get the list of the first 10 customer ids.
SELECT customer_id FROM customer ORDER BY transaction_date ASC LIMIT 10;
COUNT is a type of function that is used in columns and it should come after the SELECT. COUNT returns the number of data in the specified column. In general, passing * as an argument to the COUNT will also return the same result as the column name.
To get the count of some customers whose age is not more than 50.
SELECT COUNT(*) FROM customer WHERE age<=50;
The use of regular expressions is ubiquitous in all programming languages similarly we use regex in SQL to match the strings. LIKE is used in conjunction with WHERE and followed by the regex.
- %String – grabs all the words that end with String
- String% – grabs all the words that start with String
- %String% – grabs all the words that comprise the String in between them
Note: LIKE is case sensitive and looks only for the string that is mentioned. LIKE is an alternative that is case insensitive and can match across various cases of the string. To get all the names of the customer whose first letter in their first name starts with ‘P’
SELECT first_name FROM customer WHERE first_name LIKE 'P%';
To get all the customers whose last name has ‘Siva’ in between and should be case insensitive.
SELECT * FROM customer WHERE last_name ILIKE '%siva%';
At times the entries in the table might have duplicate data. To filter out unique data, DISTINCT is used. DISTINCT applies only to the column and comes next to SELECT and even as an argument to the COUNT function.
To get the count of distinct areas where the customers are from.
SELECT COUNT(DISTINCT area) FROM customer;
To get distinct areas where the customers are from.
SELECT DISTINCT area FROM customer;
IN is used to specify the list of items against which a column is compared. For example, we can check if a column consists of (Orange, Mango, Banana) and the function filters and display only the entries with orange, mango, and banana.
To get the count of customers who are located in the area with Pincode ’82’ between the age group 60 to 70.
SELECT COUNT(*) FROM customer WHERE pincode IN(82) and age BETWEEN 60 AND 70;
Aggregation means a cluster of things that are brought together. Likewise aggregation in SQL clusters the data into a homogeneous category and groups it. For example,
To get the total number of transactions made by all customers.
SELECT customer_id, COUNT(transaction_date) FROM customer GROUP BY customer_id;
Here COUNT is the aggregation function that has been performed on the transaction_date column concerning the customer_id. So a group is formed for every customer_id where you will get a matching count of the number of transactions. A normal COUNT would have resulted in the total number of transactions whereas aggregation with customer_id by using GROUP BY, produces a table with individual customer ids and their corresponding count of transactions.
GROUP BY should always have the column names that are mentioned in the SELECT criteria except for the aggregation function.
To get an average amount that a customer has spent on purchases.
SELECT customer_id, AVG(amount) from customer GROUP BY customer_id;
To get a rounded value of the average calculated.
SELECT customer_id, round(AVG(amount),2) from customer GROUP BY customer_id;
To arrange the SUM(amount) in descending order and display only 10 rows.
SELECT customer_id, SUM(amount) FROM customer GROUP BY customer_id ORDER BY SUM(amount) DESC LIMIT 10;
It is not possible to use the WHERE condition on the aggregate function and that is where we use HAVING after the GROUP BY and specify the condition.
To display the customers who have spent more than 100 INR.
SELECT customer_id, COUNT(amount) FROM customer GROUP BY customer_id HAVING COUNT(amount)>=100;
To display the customers who were attended by the staff with id ‘2’ and their total spending is more than 1000 INR.
SELECT customer_id, staff_id, SUM(amount) FROM customer WHERE staff_id=2 GROUP BY customer_id, staff_id HAVING SUM(amount)>1000;
JOINS play an important role while analyzing data by combining multiple tables. There are four types of JOINS
- INNER JOIN
- LEFT OUTER JOIN
- RIGHT OUTER JOIN
- FULL OUTER JOIN
INNER JOIN – Compares between two tables on the specified columns and returns the data that matches in both the columns.
LEFT OUTER JOIN – Compares between two tables on the specified columns and returns the data that matches in the left table. If no value is present in the right table matching the left, then it returns the data as null.
RIGHT OUTER JOIN – Compares between two tables on the specified columns and returns the data that matches in the right table. If no value is present in the left table matching the right, then it returns the data as null.
FULL OUTER JOIN – Compares between two tables on the specified columns and returns the data that matches in both the columns along with mismatches with a value as null.
You can read more about JOINS here.
INNER JOIN to merge two tables according to matching values of customer_id.
SELECT payment_id, customer.customer_id, first_name, last_name FROM customer INNER JOIN payment ON customer.customer_id = payment.customer_id;
FULL OUTER JOIN to merge two tables according to matching values of customer_id.
SELECT payment_id, customer.customer_id, first_name, last_name FROM customer FULL OUTER JOIN payment ON customer.customer_id = payment.customer_id WHERE payment.customer_id IS null OR customer.customer_id IS null;
LEFT OUTER JOIN to merge two tables according to matching values of customer_id based on the left table.
SELECT payment_id, customer.customer_id FROM payment LEFT OUTER JOIN customer on payment.customer_id = customer.customer_id WHERE payment.customer_id IS NULL;
RIGHT OUTER JOIN to merge two tables according to matching values of customer_id based on the right table.
SELECT payment.customer_id,customer_id FROM customer RIGHT OUTER JOIN payment on customer.customer_id = payment.customer_id WHERE payment.customer_id IS NULL;
You can also use nested joins to query from multiple tables.
SELECT customer_id, first_name, last_name FROM customer INNER JOIN payment ON customer.customer_id = payment.customer_id INNER JOIN billing ON payment.customer_id = billing.customer_id WHERE first_name='Sai' AND last_name='Kumar';
Other SQL Commands
To display the timezone of your current location.
To get today’s timestamp.
To get the current time of the day.
To extract month, day, and year from the timestamp use the function EXTRACT which will return a numeric month, date, or year.
SELECT EXTRACT(MONTH FROM transaction_date) FROM customer;
To convert a timestamp into a specific string format like ‘DD/MM/YYYY’, the TO_CHAR function is used where you specify the timestamp column name as the first argument and followed by the format to be returned.
SELECT DISTINCT (TO_CHAR(payment_date,'DD/MM/YYYY')) FROM payment;
A sub query returns either a single data or a list of data that can be asserted with its parent query.
SELECT first_name || ' ' || last_name AS Full_Name FROM customer WHERE address_id IN (SELECT address_id FROM address WHERE district = 'Chennai');
Similar to conditionals in programming languages, SQL has a CASE statement that runs a check against the selected column and performs specified operations on it.
SELECT SUM(CASE citizen WHEN 'INDIAN' THEN 1 ELSE 0 END)as IN, SUM(CASE rating WHEN 'NRI' THEN 1 ELSE 0 END)as NRI FROM customer;
That will be all and there are a few more concepts like views, procedures, etc. which are much more advanced and I will cover those in my future posts.