Lab 07 โ Syscall Filtering with seccomp¶
| Course | SCIA-360 OS Security |
| Topic | Security Policies |
| Chapter | 7 |
| Difficulty | โญโญ Intermediate |
| Estimated Time | 45โ60 minutes |
| Prerequisites | Lab 06 completed; Docker installed and running |
Overview¶
seccomp (Secure Computing Mode) is a Linux kernel facility that filters system calls before they reach the kernel. Docker applies a default seccomp profile that blocks ~44 dangerous syscalls from every container, regardless of capabilities or user ID.
In this lab you will:
- Verify seccomp filter status via
/proc/self/status - Observe syscalls blocked by the default Docker profile
- Use
straceto trace exactly which syscalls real programs make - Build a syscall audit table classifying calls as allowed or blocked
Grading Rubric
| Component | Points |
|---|---|
| Screenshots (07a โ 07g) | 40 pts |
| Syscall classification table (completed โ syscall name, number, allowed/blocked, reason) | 20 pts |
| Reflection questions (4 ร 10 pts) | 40 pts |
| Total | 100 pts |
Background: The Three Layers of Container Isolation¶
| Layer | What It Restricts |
|---|---|
| Namespaces | Visibility โ processes, network interfaces, mount points, user IDs |
| Capabilities | Privileged kernel operations a process may perform |
| seccomp | Which system call numbers the process may invoke at all |
seccomp operates at the lowest layer โ the BPF (Berkeley Packet Filter) program runs in the kernel before the system call handler. Even if a capability check would have allowed an operation, seccomp can block the syscall entirely.
Part 1 โ seccomp Status¶
Step 1.1 โ Verify the Filter is Active¶
Expected output:
seccomp mode values:
| Mode | Meaning |
|---|---|
0 | Disabled โ all syscalls pass through |
1 | Strict โ only read, write, exit, sigreturn allowed |
2 | Filter โ a BPF program decides per-syscall (Docker default) |
๐ธ Screenshot checkpoint 07a: Terminal output showing Seccomp: 2 and Seccomp_filters: 1.
Step 1.2 โ Disable seccomp and Compare¶
docker run --rm --security-opt seccomp=unconfined ubuntu:22.04 bash -c \
"grep Seccomp /proc/self/status"
Expected output:
๐ธ Screenshot checkpoint 07b: Seccomp: 0 output, ideally placed next to or below 07a for direct comparison.
Never use seccomp=unconfined in production
Disabling seccomp removes the last kernel-level syscall barrier. The flag exists for debugging and for cases where a custom profile will be applied โ not for leaving containers unfiltered in a live environment.
Part 2 โ Observing Blocked Syscalls¶
Step 2.1 โ reboot Syscall Blocked by Default Profile¶
docker run --rm python:3.11-slim python3 -c "
import ctypes, ctypes.util, ctypes as ct
libc = ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True)
ct.set_errno(0)
result = libc.syscall(169, 0xfee1dead, 0x28121969, 0x01234567, 0)
errno_val = ct.get_errno()
print(f'reboot syscall result={result}, errno={errno_val}')
if errno_val == 1: print('EPERM: blocked by seccomp!')
"
Expected output:
Syscall 169 (reboot) is in Docker's deny list. The kernel returns -1 with errno = EPERM (1 = Operation not permitted) before any reboot logic executes.
๐ธ Screenshot checkpoint 07c: result=-1, errno=1 and EPERM: blocked by seccomp! clearly visible.
Step 2.2 โ Benign Syscalls Are Allowed¶
docker run --rm python:3.11-slim python3 -c "
import ctypes, ctypes.util
libc = ctypes.CDLL(ctypes.util.find_library('c'))
pid = libc.getpid()
uid = libc.getuid()
print(f'getpid()={pid} getuid()={uid} (both allowed)')
"
Expected output: getpid()=1 getuid()=0 (both allowed)
getpid (syscall 39) and getuid (syscall 102) are harmless introspection calls that every process needs โ they pass through the seccomp filter without restriction.
๐ธ Screenshot checkpoint 07d: getpid() and getuid() values printed with (both allowed) suffix.
Part 3 โ strace: Tracing Syscalls in Real Time¶
Step 3.1 โ Trace ls with strace¶
docker run --rm --cap-add SYS_PTRACE --security-opt seccomp=unconfined ubuntu:22.04 bash -c "
apt-get update -qq && apt-get install -y -qq strace 2>/dev/null
strace -e trace=openat,read,write,getdents64 ls /tmp 2>&1 | head -12"
Expected output: Lines like:
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libselinux.so.1", ...) = 3
getdents64(3, /* 0 entries */, 32768) = 0
strace intercepts every syscall the traced process makes. Filtered here to just four syscall types, you can see ls must: open shared libraries, then call getdents64 to read directory entries.
SYS_PTRACE + seccomp=unconfined
strace requires CAP_SYS_PTRACE to attach to processes and needs seccomp disabled so ptrace syscalls aren't blocked. Use this combination only in controlled lab environments โ never in production.
๐ธ Screenshot checkpoint 07e: strace output showing at least openat and getdents64 calls.
Step 3.2 โ Trace Python Reading a File¶
docker run --rm --cap-add SYS_PTRACE --security-opt seccomp=unconfined python:3.11-slim bash -c "
apt-get install -y -qq strace 2>/dev/null
echo 'import sys; print(open(\"/etc/hostname\").read().strip())' > /tmp/read_file.py
strace -e trace=openat,read python3 /tmp/read_file.py 2>&1 | grep -E 'openat.*hostname|^[a-f0-9]' | head -5"
Expected output: At least one openat call referencing /etc/hostname, confirming that Python's open() translates directly to an openat syscall.
๐ธ Screenshot checkpoint 07f: strace output showing the openat call for /etc/hostname.
Part 4 โ Syscall Audit¶
Step 4.1 โ Automated Syscall Allow/Block Test¶
docker run --rm python:3.11-slim python3 -c "
import ctypes, ctypes.util, ctypes as ct
libc = ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True)
syscalls = [
('getpid', 39),
('getuid', 102),
('time', 201),
('reboot', 169),
('kexec_load', 246),
('syslog', 103),
]
for name, num in syscalls:
ct.set_errno(0)
result = libc.syscall(num, 0, 0, 0, 0)
errno_val = ct.get_errno()
blocked = errno_val == 1
status = 'BLOCKED (seccomp)' if blocked else 'allowed'
print(f'{name:15} (#{num:3}): {status}')
"
Expected output:
getpid (# 39): allowed
getuid (#102): allowed
time (#201): allowed
reboot (#169): BLOCKED (seccomp)
kexec_load (#246): BLOCKED (seccomp)
syslog (#103): allowed โ may vary by kernel config
๐ธ Screenshot checkpoint 07f: Full six-line syscall audit table output.
Step 4.2 โ Default Profile vs. seccomp=unconfined¶
echo '=== Default seccomp ==='
docker run --rm python:3.11-slim python3 -c "
import ctypes, ctypes.util, ctypes as ct
libc = ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True)
ct.set_errno(0); libc.syscall(169,0xfee1dead,0x28121969,0x01234567,0)
print('reboot:', 'BLOCKED' if ct.get_errno()==1 else 'reached kernel')"
echo '=== seccomp=unconfined ==='
docker run --rm --security-opt seccomp=unconfined python:3.11-slim python3 -c "
import ctypes, ctypes.util, ctypes as ct
libc = ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True)
ct.set_errno(0); libc.syscall(169,0xfee1dead,0x28121969,0x01234567,0)
errno_v = ct.get_errno()
print('reboot:', 'BLOCKED (caps)' if errno_v==1 else f'reached kernel errno={errno_v}')"
Note: In the unconfined case the reboot syscall reaches the kernel but is still blocked by capabilities (CAP_SYS_BOOT is not in Docker's default capability set). This demonstrates that capabilities and seccomp are complementary but independent layers โ removing seccomp does not remove capability checks.
๐ธ Screenshot checkpoint 07g: Both === Default seccomp === and === seccomp=unconfined === sections visible, showing the difference in where the block occurs.
Syscall Classification Table¶
Complete and submit this table with your lab report:
| Syscall | Number | Default Profile | Reason |
|---|---|---|---|
getpid | 39 | โ Allowed | Every process needs its own PID |
getuid | 102 | โ Allowed | Basic identity introspection |
time | 201 | โ Allowed | Read-only clock access |
reboot | 169 | โ Blocked | Could reboot or halt the host |
kexec_load | 246 | โ Blocked | Load a new kernel โ severe host escape vector |
syslog | 103 | โ / โ Varies | Read kernel ring buffer |
| (add two more) |
Cleanup¶
Assessment¶
Screenshot Checklist¶
| ID | Required Content |
|---|---|
| 07a | Seccomp: 2 and Seccomp_filters: 1 from /proc/self/status |
| 07b | Seccomp: 0 with seccomp=unconfined |
| 07c | reboot syscall returning result=-1, errno=1 + EPERM: blocked by seccomp! |
| 07d | getpid() and getuid() returning valid values with (both allowed) |
| 07e | strace output showing openat and getdents64 calls from ls |
| 07f | Full six-line syscall audit table (allowed vs. blocked) |
| 07g | Default seccomp BLOCKED vs. unconfined reached kernel comparison |
Reflection Questions¶
Submission requirement
Answer each question in complete paragraphs (minimum 4โ6 sentences each). Include technical specifics โ syscall names, error codes, and mechanism details where relevant.
Q1. What is a system call? Explain the concept using an analogy โ a user-space program cannot directly access hardware; it must ask the kernel on its behalf. Using that analogy, explain seccomp's role as the gatekeeper of that conversation. How does the BPF program implement that gatekeeping at the kernel level?
Q2. Docker's default seccomp profile blocks syscalls including reboot, kexec_load, and create_module. An application container running a Python web service will never legitimately need any of these. Explain why each one exists in the kernel and what specific damage an attacker could cause if they were available inside a compromised container.
Q3. strace revealed that ls makes openat() and getdents64() syscalls โ not the high-level readdir() C function you might expect. Why would a security analyst use strace when investigating a suspicious binary found on a compromised system? What would they be looking for, and what kind of malicious behaviour might be revealed in the syscall trace?
Q4. seccomp, Linux capabilities, and namespaces each contribute a distinct layer to container isolation. Explain precisely what each layer restricts and why all three are necessary โ in other words, describe the specific attacks that would be possible if any single layer were removed even while the other two remained in place.