Skip to content
Sahithyan's S3
1
Sahithyan's S3 — Database Systems

Introduction to SQL

A declarative DML and DDL. Case insensitive. Became the industry standard in the 1980s.

Query planner decides on the most efficient way to execute a query.

SQL has several implementations with varying features and syntax:

  • MySQL: Popular open-source database system, currently owned by Oracle

    • Major versions: 5.7, 8.0
  • PostgreSQL: Advanced open-source object-relational database

    • Major versions: 11, 12, 13, 14, 15, 16
  • Microsoft SQL Server: Microsoft’s enterprise database system

    • Major versions: 2016, 2019, 2022
  • Oracle Database: Enterprise database system by Oracle Corporation

    • Major versions: 12c, 19c, 21c
  • SQLite: Self-contained, serverless database engine

    • Lightweight, embedded solution
  • MariaDB: Community-developed fork of MySQL

Each SQL flavor implements the ANSI SQL standard but adds proprietary extensions and features, which can affect portability of SQL code between systems. The syntax and capabilities of functions vary between SQL dialects.

SQL supports several built-in data types as well as user-defined data types.

  • Numeric Types:

    • INTEGER / INT: Whole numbers without decimal places
    • DECIMAL(p,s) / NUMERIC(p,s): Exact numeric values with specified precision
    • FLOAT / REAL: Approximate numeric values
    • SMALLINT / BIGINT: Smaller and larger integer values
  • String Types:

    • CHAR(n): Fixed-length character strings
    • VARCHAR(n): Variable-length character strings
    • TEXT: Large variable-length character strings
  • Date and Time Types:

    • DATE: Stores date values (YYYY-MM-DD)
    • TIME: Stores time values (HH:MM:SS)
    • TIMESTAMP: Stores date and time values
    • DATETIME: Similar to TIMESTAMP
  • Boolean Type:

    • BOOLEAN: Stores TRUE, FALSE, or NULL values
  • Binary Types:

    • BLOB: Binary Large Object for storing binary data
    • BINARY / VARBINARY: Fixed and variable-length binary data

Users can define additional data types using CREATE TYPE.

Allow defining custom data types with constraints. They are useful for enforcing business rules and improving code readability.

Most early database systems allowed adding new columns but not dropping existing ones.

Adding a column is a non-destructive change. The DBMS appends metadata for the new column, which can be quickly done. And treats missing values in old rows as NULL or default. Backward compatibility is maintained when a new column is added.

Dropping a column is destructive. It requires rewriting every record to remove stored bytes and update file formats. The table must be locked during this process. Backward compatibility will be lost if another database object depends on the dropped column. This introduces potential data loss, or errors.

However modern database systems support dropping columns. They do this safely by combining metadata indirection, lazy cleanup, and background table rewriting. These mechanisms make column removal efficient and non-destructive.

The column would be marked as “deleted” in metadata. Queries and new writes simply ignore the dropped column.

Instead of removing the data instantly, the process is deferred. Avoids long table locks. Preserves transactional safety.

Modern engines maintain system catalogs that track all dependencies. To ensure a consistent schema, when dropping a column:

  • The DBMS checks for dependent objects.
  • Either rejects the operation or performs a cascade delete of references.

To combine columns from more than 1 relations based on related columns.

  • INNER JOIN: Returns records that have matching values in both tables.
  • LEFT (OUTER) JOIN: Returns all records from the left table and matched records from the right table.
  • RIGHT (OUTER) JOIN: Returns all records from the right table and matched records from the left table.
  • FULL (OUTER) JOIN: Returns all records when there’s a match in either the left or right table.
  • CROSS JOIN: Returns the Cartesian product of two tables.
  • NATURAL JOIN: Returns the intersection of two tables based on common column names. Not recommended due to ambiguity and potential performance issues.

Can be nested, but have performance implications.

A nested subquery is a query inside another query. Allows one SQL statement to use the result of another as input, enabling complex filtering and comparison logic. Cannot use outer row values.

A subquery (inner query) is enclosed in parentheses and used within:

  • WHERE
  • FROM
  • HAVING
  • SELECT

Subquery executes first. Its result is passed to the outer query. The outer query then applies filters or joins based on that result.

Returns one value. Used with comparison operators.

SELECT name
FROM student
WHERE total_credits = (
SELECT MAX(total_credits)
FROM student
);

Returns multiple values. Used with operators like IN, ANY, ALL.

SELECT name
FROM student
WHERE dept_name IN (
SELECT dept_name
FROM department
WHERE building = 'Watson'
);

Returns multiple columns per row. Compared using tuple notation.

SELECT name
FROM instructor
WHERE (dept_name, salary) IN (
SELECT dept_name, salary
FROM instructor
WHERE salary > 90000
);

Subquery acts as a temporary table (derived relation).

SELECT dept_name, avg_sal
FROM (
SELECT dept_name, AVG(salary) AS avg_sal
FROM instructor
GROUP BY dept_name
) AS dept_avg
WHERE avg_sal > 80000;

A subquery which is dependent on outer row values. Runs once for each row. Has performance pitfalls. Can often be replaced by JOIN for efficiency.