Lab 11: Database Scripting & Automation¶
| Field | Details |
|---|---|
| Course | ITEC-445 โ Advanced Database Management |
| Week | 11 |
| Difficulty | โญโญโญ Advanced |
| Estimated Time | 80 minutes |
| Topic | Database Scripting & Automation |
| Prerequisites | Lab 01 schema on Neon branch lab-11, Python 3.10+, pip install psycopg2-binary pandas |
| Deliverables | Python scripts, idempotent migration, health check procedure, verify_lab11() PASS |
Overview¶
Manual database administration doesn't scale. Professional DBAs automate everything: health checks, batch jobs, data migrations, report generation, and schema changes. In this lab you will write production-quality Python automation scripts, build idempotent SQL migration scripts, and implement a database health-check procedure.
Branch Requirement
Create branch lab-11 with Lab 01 schema + Lab 06 seed data (500+ students).
Part A โ psql Non-Interactive Scripting (15 pts)¶
psql can be used as a scripting engine. Create the following shell scripts:
A1. db_summary.sh โ quick database stats via psql:
#!/bin/bash
# db_summary.sh โ Frostburg DB quick stats
set -e
DATABASE_URL="${DATABASE_URL:?DATABASE_URL must be set}"
echo "=== Frostburg University DB Summary ==="
echo "Timestamp: $(date)"
echo ""
psql "$DATABASE_URL" --tuples-only --no-align << 'EOF'
SET search_path = fsu;
SELECT 'Tables' AS category, COUNT(*)::TEXT AS value
FROM information_schema.tables WHERE table_schema = 'fsu' AND table_type = 'BASE TABLE'
UNION ALL
SELECT 'Students', COUNT(*)::TEXT FROM fsu.students
UNION ALL
SELECT 'Enrollments', COUNT(*)::TEXT FROM fsu.enrollments
UNION ALL
SELECT 'Avg GPA', ROUND(AVG(gpa)::NUMERIC,2)::TEXT FROM fsu.students WHERE gpa IS NOT NULL
UNION ALL
SELECT 'Indexes', COUNT(*)::TEXT FROM pg_indexes WHERE schemaname = 'fsu'
ORDER BY 1;
EOF
echo ""
echo "=== Done ==="
Run:
A2. run_sql.sh โ safe SQL file executor with logging:
#!/bin/bash
# run_sql.sh <sql_file> โ runs a SQL file and logs output + errors
SQL_FILE="${1:?Usage: run_sql.sh <file.sql>}"
LOG_DIR="./logs"
mkdir -p "$LOG_DIR"
LOG_FILE="$LOG_DIR/$(basename "$SQL_FILE" .sql)_$(date +%Y%m%d_%H%M%S).log"
echo "Running $SQL_FILE at $(date)" | tee "$LOG_FILE"
psql "$DATABASE_URL" \
--set ON_ERROR_STOP=1 \
--echo-all \
-f "$SQL_FILE" \
>> "$LOG_FILE" 2>&1
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "SUCCESS โ log: $LOG_FILE"
else
echo "FAILED (exit $EXIT_CODE) โ see $LOG_FILE"
exit $EXIT_CODE
fi
Part B โ Python Automation (35 pts)¶
B1. db_health_check.py โ comprehensive health check script:
#!/usr/bin/env python3
"""
db_health_check.py โ Frostburg University Database Health Check
Checks: connection, row counts, GPA anomalies, orphaned records, index coverage
"""
import os
import sys
import psycopg2
from datetime import datetime
def check(name: str, query: str, threshold=None, operator='>='):
"""Run a check query and report pass/fail."""
try:
cur.execute(query)
row = cur.fetchone()
value = row[0] if row else None
if threshold is not None:
if operator == '>=' and value >= threshold:
status = 'PASS'
elif operator == '<=' and value <= threshold:
status = 'PASS'
elif operator == '==' and value == threshold:
status = 'PASS'
else:
status = 'WARN'
else:
status = 'INFO'
icon = {'PASS': 'โ', 'WARN': 'โ ', 'INFO': 'โน'}[status]
print(f" [{icon}] {name}: {value} {'(threshold: ' + str(threshold) + ')' if threshold else ''}")
return status, value
except Exception as e:
print(f" [โ] {name}: ERROR โ {e}")
return 'ERROR', None
conn = psycopg2.connect(os.environ['DATABASE_URL'])
cur = conn.cursor()
cur.execute("SET search_path = fsu")
print(f"=== DB Health Check: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} ===\n")
print("Row Counts:")
check("Students", "SELECT COUNT(*) FROM students", 500)
check("Enrollments", "SELECT COUNT(*) FROM enrollments", 1000)
check("Courses", "SELECT COUNT(*) FROM courses", 10)
check("Departments", "SELECT COUNT(*) FROM departments", 5)
print("\nData Quality:")
check("Students with NULL GPA", "SELECT COUNT(*) FROM students WHERE gpa IS NULL", 50, '<=')
check("Invalid GPA (>4.0)", "SELECT COUNT(*) FROM students WHERE gpa > 4.0", 0, '==')
check("Orphaned enrollments",
"SELECT COUNT(*) FROM enrollments e LEFT JOIN students s ON e.student_id=s.student_id WHERE s.student_id IS NULL",
0, '==')
check("Duplicate enrollments",
"SELECT COUNT(*) FROM (SELECT student_id,course_id,semester,COUNT(*) FROM enrollments GROUP BY 1,2,3 HAVING COUNT(*)>1) x",
0, '==')
print("\nIndex Coverage:")
check("Non-PK indexes",
"SELECT COUNT(*) FROM pg_indexes WHERE schemaname='fsu' AND indexname NOT LIKE '%pkey%'",
3)
print("\nRecent Activity:")
check("Audit log entries",
"SELECT COALESCE((SELECT COUNT(*) FROM audit_log),0)",
0)
conn.close()
print("\n=== Health check complete ===")
Run:
B2. batch_gpa_update.py โ batch GPA recalculation with progress reporting:
#!/usr/bin/env python3
"""
batch_gpa_update.py โ Recalculate GPAs for all students in batches
Uses chunked processing to avoid long-running transactions.
"""
import os, psycopg2
BATCH_SIZE = 50
conn = psycopg2.connect(os.environ['DATABASE_URL'])
conn.autocommit = False
cur = conn.cursor()
cur.execute("SET search_path = fsu")
# Get all student IDs
cur.execute("SELECT student_id FROM students ORDER BY student_id")
all_ids = [row[0] for row in cur.fetchall()]
total = len(all_ids)
updated = 0
unchanged = 0
print(f"Processing {total} students in batches of {BATCH_SIZE}...")
for i in range(0, total, BATCH_SIZE):
batch = all_ids[i:i+BATCH_SIZE]
for student_id in batch:
# Calculate new GPA from grade_points view
cur.execute("""
SELECT ROUND(AVG(points)::NUMERIC, 2)
FROM grade_points
WHERE student_id = %s AND points IS NOT NULL
""", (student_id,))
new_gpa = cur.fetchone()[0]
# Only update if different
cur.execute("SELECT gpa FROM students WHERE student_id = %s", (student_id,))
old_gpa = cur.fetchone()[0]
if new_gpa != old_gpa:
cur.execute("UPDATE students SET gpa = %s WHERE student_id = %s",
(new_gpa, student_id))
updated += 1
else:
unchanged += 1
conn.commit()
progress = min(i + BATCH_SIZE, total)
print(f" Batch complete: {progress}/{total} processed ({updated} updated so far)")
conn.close()
print(f"\nDone: {updated} updated, {unchanged} unchanged out of {total} students")
Part C โ Idempotent Migration Script (25 pts)¶
Professional schema changes must be idempotent โ safe to run multiple times. Write migrations/V003__add_scholarship_table.sql:
-- Migration: V003 โ Add scholarship tracking
-- Idempotent: safe to run multiple times
-- Author: [Your Name]
-- Date: [Date]
SET search_path = fsu;
-- Version tracking table (create if not exists)
CREATE TABLE IF NOT EXISTS schema_migrations (
version VARCHAR(20) PRIMARY KEY,
description TEXT,
applied_at TIMESTAMPTZ DEFAULT NOW(),
applied_by TEXT DEFAULT current_user
);
-- Guard: skip if already applied
DO $$
BEGIN
IF EXISTS(SELECT 1 FROM schema_migrations WHERE version = 'V003') THEN
RAISE NOTICE 'Migration V003 already applied โ skipping';
RETURN;
END IF;
-- Create scholarships table
CREATE TABLE IF NOT EXISTS scholarships (
scholarship_id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
amount NUMERIC(10,2) NOT NULL CHECK (amount > 0),
dept_id INT REFERENCES departments(dept_id),
min_gpa NUMERIC(3,2) DEFAULT 3.0,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create student_scholarships junction table
CREATE TABLE IF NOT EXISTS student_scholarships (
student_id INT NOT NULL REFERENCES students(student_id),
scholarship_id INT NOT NULL REFERENCES scholarships(scholarship_id),
awarded_year SMALLINT NOT NULL,
amount_awarded NUMERIC(10,2),
PRIMARY KEY (student_id, scholarship_id, awarded_year)
);
-- Add index for scholarship lookups by department
CREATE INDEX IF NOT EXISTS idx_scholarships_dept ON scholarships(dept_id);
-- Seed initial scholarships
INSERT INTO scholarships (name, amount, dept_id, min_gpa) VALUES
('Dean''s Excellence Award', 5000.00, NULL, 3.8),
('CS Department Award', 2500.00, 1, 3.5),
('Math Department Award', 2500.00, 2, 3.5),
('General Merit Award', 1000.00, NULL, 3.0)
ON CONFLICT DO NOTHING;
-- Record migration as applied
INSERT INTO schema_migrations (version, description)
VALUES ('V003', 'Add scholarship and student_scholarships tables');
RAISE NOTICE 'Migration V003 applied successfully';
END;
$$;
-- Verify
SELECT version, description, applied_at FROM schema_migrations ORDER BY version;
Run it twice:
psql "$DATABASE_URL" -f migrations/V003__add_scholarship_table.sql
psql "$DATABASE_URL" -f migrations/V003__add_scholarship_table.sql # should skip
Second run should print: NOTICE: Migration V003 already applied โ skipping
Part D โ PL/pgSQL Health Check Procedure (15 pts)¶
Write a stored procedure db_health_report() that runs entirely in-database:
CREATE OR REPLACE FUNCTION db_health_report()
RETURNS TABLE(check_name TEXT, value TEXT, status TEXT) AS $$
BEGIN
-- Student count
RETURN QUERY SELECT 'Student count'::TEXT,
(SELECT COUNT(*)::TEXT FROM students),
CASE WHEN (SELECT COUNT(*) FROM students) >= 500 THEN 'OK' ELSE 'LOW' END;
-- GPA anomalies
RETURN QUERY SELECT 'Invalid GPA count'::TEXT,
(SELECT COUNT(*)::TEXT FROM students WHERE gpa > 4.0 OR gpa < 0),
CASE WHEN (SELECT COUNT(*) FROM students WHERE gpa > 4.0 OR gpa < 0) = 0
THEN 'OK' ELSE 'ERROR' END;
-- Orphaned enrollments
RETURN QUERY SELECT 'Orphaned enrollments'::TEXT,
(SELECT COUNT(*)::TEXT FROM enrollments e
WHERE NOT EXISTS(SELECT 1 FROM students s WHERE s.student_id = e.student_id)),
CASE WHEN (SELECT COUNT(*) FROM enrollments e
WHERE NOT EXISTS(SELECT 1 FROM students s WHERE s.student_id=e.student_id)) = 0
THEN 'OK' ELSE 'ERROR' END;
-- Index count
RETURN QUERY SELECT 'Non-PK index count'::TEXT,
(SELECT COUNT(*)::TEXT FROM pg_indexes WHERE schemaname='fsu' AND indexname NOT LIKE '%pkey%'),
CASE WHEN (SELECT COUNT(*) FROM pg_indexes WHERE schemaname='fsu' AND indexname NOT LIKE '%pkey%') >= 3
THEN 'OK' ELSE 'LOW' END;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM db_health_report();
Verification¶
SET search_path = fsu;
CREATE OR REPLACE FUNCTION verify_lab11()
RETURNS TABLE(check_id TEXT, description TEXT, result TEXT, points INT) AS $$
BEGIN
RETURN QUERY SELECT '01', 'schema_migrations table exists',
CASE WHEN EXISTS(SELECT 1 FROM information_schema.tables
WHERE table_schema='fsu' AND table_name='schema_migrations')
THEN 'PASS' ELSE 'FAIL' END, 10;
RETURN QUERY SELECT '02', 'V003 migration applied',
CASE WHEN EXISTS(SELECT 1 FROM schema_migrations WHERE version='V003')
THEN 'PASS' ELSE 'FAIL' END, 10;
RETURN QUERY SELECT '03', 'scholarships table exists',
CASE WHEN EXISTS(SELECT 1 FROM information_schema.tables
WHERE table_schema='fsu' AND table_name='scholarships')
THEN 'PASS' ELSE 'FAIL' END, 10;
RETURN QUERY SELECT '04', 'scholarships has 4 seed rows',
CASE WHEN (SELECT COUNT(*) FROM scholarships) = 4
THEN 'PASS' ELSE 'FAIL' END, 10;
RETURN QUERY SELECT '05', 'student_scholarships table exists',
CASE WHEN EXISTS(SELECT 1 FROM information_schema.tables
WHERE table_schema='fsu' AND table_name='student_scholarships')
THEN 'PASS' ELSE 'FAIL' END, 10;
RETURN QUERY SELECT '06', 'db_health_report function exists',
CASE WHEN EXISTS(SELECT 1 FROM pg_proc WHERE proname='db_health_report')
THEN 'PASS' ELSE 'FAIL' END, 10;
RETURN QUERY SELECT '07', 'db_health_report returns rows',
CASE WHEN (SELECT COUNT(*) FROM db_health_report()) >= 4
THEN 'PASS' ELSE 'FAIL' END, 10;
RETURN QUERY SELECT '08', 'No orphaned enrollments detected',
CASE WHEN (
SELECT value FROM db_health_report() WHERE check_name='Orphaned enrollments'
) = '0' THEN 'PASS' ELSE 'FAIL' END, 10;
RETURN QUERY SELECT '09', 'idx_scholarships_dept index exists',
CASE WHEN EXISTS(SELECT 1 FROM pg_indexes WHERE indexname='idx_scholarships_dept')
THEN 'PASS' ELSE 'FAIL' END, 10;
RETURN QUERY SELECT '10', 'Students table has 500+ rows',
CASE WHEN (SELECT COUNT(*) FROM students) >= 500
THEN 'PASS' ELSE 'FAIL' END, 10;
END;
$$ LANGUAGE plpgsql;
SELECT check_id, description, result,
CASE WHEN result = 'PASS' THEN points ELSE 0 END AS earned
FROM verify_lab11()
ORDER BY check_id;
Additional Requirement (20 pts)¶
Write scholarship_award.py โ a Python script that automatically awards scholarships to eligible students:
- Queries all scholarships with their
min_gpaand optionaldept_id - For each scholarship, finds all eligible students (GPA >= min_gpa, matching dept if set)
- Awards the scholarship to students not already awarded in the current year
- Inserts rows into
student_scholarships - Prints a summary:
Scholarship X: awarded to N students
Submission Checklist¶
- [ ]
db_summary.shโ runs and shows stats - [ ]
run_sql.shโ runs and logs - [ ]
db_health_check.pyโ all checks pass/warn correctly - [ ]
batch_gpa_update.pyโ runs, reports updated/unchanged count - [ ]
migrations/V003__add_scholarship_table.sqlโ idempotent (run twice, second skips) - [ ]
scholarship_award.pyโ additional requirement - [ ]
verify_lab11()screenshot โ all PASS
Grading¶
| Component | Points |
|---|---|
| Part A โ Shell scripts (db_summary + run_sql, both run correctly) | 15 |
| Part B โ Python scripts (health_check + batch_gpa, correct output) | 35 |
| Part C โ Idempotent migration (runs twice, second is no-op) | 25 |
| Part D โ db_health_report() in-DB function | 15 |
| Additional requirement โ scholarship_award.py | 20 |
| Total | 110 |