Skip to content
Sahithyan's S3
1
Sahithyan's S3 — Database Systems

Normalization

A systematic approach to decide whether a particular relation is in good form. Helps ensure correctness and maintainability. But might degrade query performance.

Normalization theory was introduced by Edgar F. Codd, the father of relational databases, in 1970.

A relation is in good form when it has these properties:

  • easier to understand and to extend
  • clear keys that uniquely identify tuples
  • appropriate atomic domains for attributes
  • functional dependencies that do not introduce partial or transitive redundancy
  • constraints captured by a suitable normal form (1NF, 2NF, 3NF, BCNF, etc.) given the application needs.

A database is in a good form when all of its relations are in good form.

Several normal forms are defined, each builds on the previous one and defines a set of rules. Consecutive normal forms ensure the relations are in a better form.

Each normal form defines a set of conditions. If one of the conditions are violated, the relation is said to be violate the normal form.

If a relation violates nnth normal form, it is violating iith normal forms where i>ni \gt n.

Unnecessary attributes in a functional dependency (FD) that can be removed without affecting the closure. Can either be in LHS or RHS.

Removing extraneous attributes:

  • Simplifies functional dependency sets
  • Helps compute minimal covers
  • Avoids redundant checks during normalization

Consider a set of functional dependencies FF, and αβF\alpha \rightarrow \beta \in F.

The attribute AαA \in \alpha is extraneous in α\alpha if (αA)+(\alpha - A)^+ contains β\beta.

The attribute AβA \in \beta is extraneous in β\beta if:

  • Compute α+\alpha^+ using only the dependencies in F=(F{αβ})(α(βA))F' = (F - \set{\alpha - \beta}) \cup (\alpha \rightarrow (\beta - A))
  • Check if α+\alpha^+ contains AA; if it does, AA is extraneous in β\beta

A situation where a change in data causes inconsistency or lose information.

Occurs when certain data cannot be inserted into a database without the presence of other, unrelated data.

Example:
Cannot add a new student unless they are enrolled in a course, because student and course details are stored in the same table.

Happens when data redundancy causes inconsistent updates across records.

Example:
If an instructor’s department name changes, it must be updated in multiple rows; missing one row leads to inconsistent data.

Occurs when deleting a record unintentionally removes valuable related information.

Example:
Deleting the last student enrolled in a course also deletes the course information itself.

If all the below conditions are met, the relation is in 1NF:

  • Row order is not used to convey information.
  • Domain of any attribute contains values of a single type, not multiple types.
    Not possible in relational DBMS.
  • It has a primary key.
  • It has no repeating groups.

If all the below conditions are met, the relation is in 2NF:

  • It’s in 1NF
  • All non-key attributes must depend on the entire primary key.

If not, the relation must be decomposed into smaller relations.

If all the below conditions are met, the relation is in 3NF:

  • It’s in 2NF
  • All non-key attributes must dependent only on the entire primary key.

Allows key attributes to depend on non-key attributes.

Aims to remove data redundancy.

Slightly stronger than 3NF. Can be thought of as 3.5NF.

If all the below conditions are met, the relation is in BCNF:

  • It’s in 3NF
  • All key attributes must dependent on and only on the entire primary key.

Suppose relation RR and a non-trivial functional dependency ABA \rightarrow B cause a violation of BCNF.

RR is decomposed into:

  • (AB)(A \cup B)
  • R(BA)R - (B - A)

If all the below conditions are met, the relation is in 4NF:

  • It’s in BCNF
  • All multivalued dependencies must be multivalued dependencies on the key.

If not, the relation must be decomposed into smaller relations.

If all the below conditions are met, the relation is in 5NF:

  • It’s in 4NF
  • It cannot be describable as the logical result of joining some other tables together.