Skip to content

Lab 06: Index Design & Query Performance

Field Details
Course ITEC-445 โ€” Advanced Database Management
Week 6
Difficulty โญโญโญ Advanced
Estimated Time 85 minutes
Topic Index Design & Query Performance
Prerequisites Lab 01 schema on Neon branch lab-06, 500+ rows of data (seed script below)
Deliverables Indexes created, EXPLAIN outputs captured, verify_lab06() PASS

Overview

Indexes are the single most impactful performance lever available to a database developer โ€” a well-designed index can reduce query cost from a full sequential scan (O(n)) to a B-tree lookup (O(log n)). In this lab you will bulk-seed the Frostburg database with realistic data, diagnose slow queries using EXPLAIN ANALYZE, design indexes to fix them, and document the before/after performance difference.


Branch Requirement

Create branch lab-06 with Lab 01 schema. Then run the bulk seed script below to get enough data for meaningful index analysis.


Part A โ€” Bulk Data Seed (run once)

SET search_path = fsu;

-- Generate 500 students using generate_series
INSERT INTO students (first_name, last_name, email, gpa, dept_id, enroll_year)
SELECT
    'Student' || g,
    'Last' || g,
    'student' || g || '@fsu.edu',
    ROUND((RANDOM() * 3.5 + 0.5)::NUMERIC, 2),
    (g % 5) + 1,   -- dept_id 1-5 cycling
    2018 + (g % 7)
FROM generate_series(1, 500) g
ON CONFLICT (email) DO NOTHING;

-- Generate enrollments: ~3 enrollments per student across 10 courses
INSERT INTO enrollments (student_id, course_id, semester, grade)
SELECT
    s.student_id,
    (g % 10) + 1,
    CASE (g % 4) WHEN 0 THEN '2024FA' WHEN 1 THEN '2024SP'
                 WHEN 2 THEN '2023FA' ELSE '2023SP' END,
    (ARRAY['A','A-','B+','B','B-','C+','C','C-','D','F'])[floor(RANDOM()*10)+1]
FROM students s
CROSS JOIN generate_series(1, 4) g
WHERE s.student_id > 12  -- skip original seed students
ON CONFLICT (student_id, course_id, semester) DO NOTHING;

SELECT 'Students: ' || COUNT(*) FROM students;
SELECT 'Enrollments: ' || COUNT(*) FROM enrollments;

You should have ~500 students and ~1500 enrollments.


Part B โ€” Baseline: Queries Without Indexes (20 pts)

First, drop any non-primary-key indexes to get a clean baseline:

-- Check existing indexes
SELECT indexname, indexdef FROM pg_indexes
WHERE schemaname = 'fsu' AND tablename IN ('students','enrollments','courses')
AND indexname NOT LIKE '%pkey%';

Run each query with EXPLAIN ANALYZE and record the results in lab06_analysis.md:

B1. Find all students in the Computer Science department:

EXPLAIN ANALYZE
SELECT student_id, first_name, last_name, gpa
FROM fsu.students
WHERE dept_id = 1
ORDER BY last_name;

B2. Find all enrollments for a specific student:

EXPLAIN ANALYZE
SELECT e.enrollment_id, c.course_code, e.semester, e.grade
FROM fsu.enrollments e
JOIN fsu.courses c ON e.course_id = c.course_id
WHERE e.student_id = 5;

B3. Find students by GPA range:

EXPLAIN ANALYZE
SELECT student_id, first_name, gpa
FROM fsu.students
WHERE gpa BETWEEN 3.5 AND 4.0
ORDER BY gpa DESC;

B4. Enrollment count per course per semester:

EXPLAIN ANALYZE
SELECT course_id, semester, COUNT(*) AS enrollment_count
FROM fsu.enrollments
WHERE semester = '2024FA'
GROUP BY course_id, semester
ORDER BY enrollment_count DESC;

Record for each: plan type (Seq Scan vs Index Scan), rows estimated, actual time.


Part C โ€” Add Indexes (30 pts)

Design and create indexes to fix each slow query:

C1. Single-column index for department lookup:

CREATE INDEX idx_students_dept_id ON fsu.students(dept_id);
EXPLAIN ANALYZE
SELECT student_id, first_name, last_name, gpa FROM fsu.students WHERE dept_id = 1 ORDER BY last_name;

C2. Covering index for enrollment lookups (avoid table heap access):

-- This index lets the query be satisfied entirely from the index
CREATE INDEX idx_enrollments_student ON fsu.enrollments(student_id)
    INCLUDE (course_id, semester, grade);

EXPLAIN ANALYZE
SELECT e.enrollment_id, e.course_id, e.semester, e.grade
FROM fsu.enrollments e
WHERE e.student_id = 5;

C3. Partial index for high-GPA students (only indexes rows we actually query):

-- Only indexes the ~10% of students with GPA >= 3.5
CREATE INDEX idx_students_high_gpa ON fsu.students(gpa DESC)
    WHERE gpa >= 3.5;

EXPLAIN ANALYZE
SELECT student_id, first_name, gpa FROM fsu.students WHERE gpa BETWEEN 3.5 AND 4.0 ORDER BY gpa DESC;

C4. Composite index for enrollment analytics:

-- Leftmost prefix: semester first (high cardinality filter)
CREATE INDEX idx_enrollments_semester_course ON fsu.enrollments(semester, course_id);

EXPLAIN ANALYZE
SELECT course_id, semester, COUNT(*) FROM fsu.enrollments WHERE semester='2024FA'
GROUP BY course_id, semester ORDER BY COUNT(*) DESC;

For each index: Record the new plan type and execution time. Compute the speedup ratio.


Part D โ€” Index Inspection (20 pts)

D1. Query pg_indexes and pg_stat_user_indexes to inventory all indexes:

SELECT
    i.indexname,
    i.tablename,
    i.indexdef,
    s.idx_scan       AS times_used,
    s.idx_tup_read   AS tuples_read,
    s.idx_tup_fetch  AS tuples_fetched,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size
FROM pg_indexes i
JOIN pg_stat_user_indexes s ON i.indexname = s.indexname
WHERE i.schemaname = 'fsu'
ORDER BY s.idx_scan DESC;

D2. Identify bloated or unused indexes:

-- Indexes with 0 scans since last stats reset
SELECT indexname, tablename, idx_scan
FROM pg_stat_user_indexes
WHERE schemaname = 'fsu' AND idx_scan = 0;

Answer in lab06_analysis.md: - Which indexes have been used (idx_scan > 0) after your EXPLAIN ANALYZE runs? - How large is each index relative to the table it indexes?

D3. Demonstrate the leftmost prefix rule. Explain why this query does NOT use idx_enrollments_semester_course:

EXPLAIN ANALYZE
SELECT * FROM fsu.enrollments WHERE course_id = 3;  -- course_id is 2nd column in index


Part E โ€” Functional Index (15 pts)

E1. Email lookup is case-insensitive. A regular index won't help WHERE LOWER(email) = 'x'. Create a functional index:

CREATE INDEX idx_students_email_lower ON fsu.students(LOWER(email));

-- Now this query uses the index:
EXPLAIN ANALYZE
SELECT * FROM fsu.students WHERE LOWER(email) = 'student1@fsu.edu';

E2. Index on enrollment year for recent-year filtering:

CREATE INDEX idx_students_enroll_year ON fsu.students(enroll_year)
    WHERE enroll_year >= 2022;

EXPLAIN ANALYZE
SELECT COUNT(*) FROM fsu.students WHERE enroll_year >= 2022;


Verification

SET search_path = fsu;

CREATE OR REPLACE FUNCTION verify_lab06()
RETURNS TABLE(check_id TEXT, description TEXT, result TEXT, points INT) AS $$
BEGIN
    RETURN QUERY SELECT '01', 'students table has 500+ rows',
        CASE WHEN (SELECT COUNT(*) FROM students) >= 500 THEN 'PASS' ELSE 'FAIL' END, 10;

    RETURN QUERY SELECT '02', 'enrollments table has 1000+ rows',
        CASE WHEN (SELECT COUNT(*) FROM enrollments) >= 1000 THEN 'PASS' ELSE 'FAIL' END, 10;

    RETURN QUERY SELECT '03', 'idx_students_dept_id exists',
        CASE WHEN EXISTS(SELECT 1 FROM pg_indexes WHERE indexname='idx_students_dept_id' AND schemaname='fsu')
             THEN 'PASS' ELSE 'FAIL' END, 10;

    RETURN QUERY SELECT '04', 'idx_enrollments_student exists',
        CASE WHEN EXISTS(SELECT 1 FROM pg_indexes WHERE indexname='idx_enrollments_student' AND schemaname='fsu')
             THEN 'PASS' ELSE 'FAIL' END, 10;

    RETURN QUERY SELECT '05', 'idx_students_high_gpa partial index exists',
        CASE WHEN EXISTS(SELECT 1 FROM pg_indexes WHERE indexname='idx_students_high_gpa' AND schemaname='fsu')
             THEN 'PASS' ELSE 'FAIL' END, 10;

    RETURN QUERY SELECT '06', 'idx_enrollments_semester_course exists',
        CASE WHEN EXISTS(SELECT 1 FROM pg_indexes WHERE indexname='idx_enrollments_semester_course' AND schemaname='fsu')
             THEN 'PASS' ELSE 'FAIL' END, 10;

    RETURN QUERY SELECT '07', 'idx_students_email_lower functional index exists',
        CASE WHEN EXISTS(SELECT 1 FROM pg_indexes WHERE indexname='idx_students_email_lower' AND schemaname='fsu')
             THEN 'PASS' ELSE 'FAIL' END, 10;

    RETURN QUERY SELECT '08', 'At least 4 non-PK indexes on fsu tables',
        CASE WHEN (SELECT COUNT(*) FROM pg_indexes WHERE schemaname='fsu'
                   AND indexname NOT LIKE '%pkey%') >= 4
             THEN 'PASS' ELSE 'FAIL' END, 10;

    RETURN QUERY SELECT '09', 'High-GPA partial index covers correct range',
        CASE WHEN (SELECT COUNT(*) FROM students WHERE gpa >= 3.5) > 0
             THEN 'PASS' ELSE 'FAIL' END, 10;

    RETURN QUERY SELECT '10', 'Functional index: email lookup works',
        CASE WHEN EXISTS(SELECT 1 FROM students WHERE LOWER(email) = 'student1@fsu.edu')
             THEN 'PASS' ELSE 'FAIL' END, 10;
END;
$$ LANGUAGE plpgsql;

SELECT check_id, description, result,
       CASE WHEN result = 'PASS' THEN points ELSE 0 END AS earned
FROM verify_lab06()
ORDER BY check_id;

Additional Requirement (20 pts)

Write lab06_index_report.md โ€” a structured index design document for the fsu schema:

Query Pattern Recommended Index Type Rationale Est. Speedup
Filter by dept_id idx_students_dept_id B-tree High cardinality filter 10x
...

Include: 1. Table of all 6 indexes you created with justification 2. For each index: before/after EXPLAIN ANALYZE times (copy from your terminal) 3. One index you deliberately chose NOT to create and why (e.g., a low-cardinality column) 4. Estimated storage overhead: total index size vs table size


Submission Checklist

  • [ ] Neon branch lab-06 with 500+ students, 1000+ enrollments
  • [ ] All 6 indexes created
  • [ ] lab06_analysis.md โ€” before/after EXPLAIN outputs for all 4 queries
  • [ ] lab06_index_report.md โ€” additional requirement
  • [ ] verify_lab06() screenshot โ€” all PASS

Grading

Component Points
Part B โ€” Baseline EXPLAIN outputs (4 queries, plan types recorded) 20
Part C โ€” 4 indexes created, after-EXPLAIN showing improvement 30
Part D โ€” Index inspection + leftmost prefix explanation 20
Part E โ€” Functional + partial indexes (2 indexes) 15
Additional requirement โ€” index design document 20
Total 105