Indexing and hashing is used to speed up data retrieval in databases. Instead of scanning an entire file, indexes provide fast access paths to records using keys. When a file is modified, every index on the file must be updated.
Search Key
Section titled “Search Key”An attribute or set of attributes used to look up records.
Index File
Section titled “Index File”A separate data structure that stores pairs (index records) of the form:
(search-key, pointer to record)Much smaller than the original file.
Index-Sequential File
Section titled “Index-Sequential File”Ordered sequential file with a primary index.
B+ Tree Index File
Section titled “B+ Tree Index File”Dynamic, self-balancing tree structure for indexing. An alternative to indexed-sequential files.
Automatically reorganizes after inserts/deletes (no file rebuild). Avoids block overflows. Keeps all keys in sorted order. Excellent for range and equality queries.
Slightly higher space and update overhead. Still, dominant indexing method in modern DBMS.
Index Evaluation Metrics
Section titled “Index Evaluation Metrics”When comparing index types, measure:
- Access time: how quickly data is found.
- Insertion/deletion time: update performance.
- Space overhead: extra storage used by index.
- Supported access types: exact-match or range-based.
Dense Index
Section titled “Dense Index”One index entry per search-key value. Points directly to the record. Faster lookups. More space. More maintenance cost.
Sparse Index
Section titled “Sparse Index”Indexes only some search-key values. Can only be used when data is physically ordered on key.
Less space. Easier maintenance. Slow lookups.
To find a key :
- Find the index entry with largest key .
- Scan sequentially in the file.
Ordered Index
Section titled “Ordered Index”Keys stored in sorted order.
Primary Index
Section titled “Primary Index”Aka. clustering index. Order of index matches file order. Usually (but not always) based on the primary key. One per file. Sequential scan is efficient.
Secondary Index
Section titled “Secondary Index”Aka. non-clustering index. Based on non-ordering attribute(s). Can be many per file. Must be dense. Sequential scan is inefficient.
Hash Index
Section titled “Hash Index”Keys mapped to buckets using a hash function.
Multilevel Index
Section titled “Multilevel Index”When a sparse index is built from a index file. Named outer index and inner index respectively.
All indices must be updated during insert/delete.
Used when the inner index is too large to fit in memory.
Bitmap Index
Section titled “Bitmap Index”A bitmap is an array of bits. Can only be used when the attribute has a small domain. One bitmap per attribute value.
Hashing
Section titled “Hashing”Static Hashing
Section titled “Static Hashing”Each record is stored in a bucket determined by a hash function:
- Each bucket is typically one disk block.
- Used for equality search (not range queries).
- Example:
h(DeptName) = sum(char codes) mod 10.
Bucket Overflow
Section titled “Bucket Overflow”Occurs when too few buckets or non-uniform data distribution.
Handling Overflow
Section titled “Handling Overflow”- Overflow chaining: overflow buckets linked in a list.
- Called closed hashing.
- Open hashing (without overflow buckets) is unsuitable for DBMS use.
Hash Function Design
Section titled “Hash Function Design”A good hash function:
- Distributes keys uniformly across buckets.
- Minimizes overflow chains.
- Avoids clustering of similar key values.
Poor hash functions map many keys to the same bucket → slow access.