· Jonathan Cutrer · Engineering · 4 min read
Linux Performance Tools You Already Have Installed
Before you reach for Datadog or htop, there's a whole set of built-in utilities that tell you exactly what your system is doing.
When a server gets slow, the instinct is to install something. A monitoring agent, a dashboard, a profiling tool. That instinct is usually wrong — not because those tools are bad, but because the answer is almost always already in the system, if you know where to look.
Here are the tools I actually reach for first.
The Basics
Every tool in this post ships with a standard Linux installation. No apt install, no pip, no containers.
| Tool | What It Measures | When to Use It |
|---|---|---|
top / htop | CPU, memory, process list | First look at a slow system |
vmstat | CPU, memory, swap, I/O, context switches | Sustained load patterns over time |
iostat | Disk I/O per device, wait times | Anything disk-bound |
ss | Network socket state | Connection counts, port usage |
sar | Historical CPU, memory, I/O, network | ”When did this start happening?” |
dmesg | Kernel ring buffer | Hardware errors, OOM kills |
journalctl | Systemd logs | Service-level context |
/proc/meminfo | Detailed memory breakdown | Exactly how memory is allocated |
/proc/net/dev | Per-interface RX/TX stats | Network saturation |
perf stat | Hardware counters (CPU cycles, cache misses) | CPU-bound bottlenecks |
htop is technically not always installed, but it usually is. sar requires sysstat package on Debian/Ubuntu — worth installing.
vmstat for Load Patterns
vmstat 1 10 samples every second for ten intervals. The columns that matter:
vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 1842316 248784 3201840 0 0 0 124 1842 3201 8 2 90 0 0
3 0 0 1841892 248784 3201840 0 0 0 208 1876 3318 12 3 85 0 0r— runnable processes (waiting for CPU). Consistently above your CPU count means CPU saturation.b— processes blocked on I/O. Non-zero here is a flag.si/so— swap in/out. Any sustained non-zero means you’re out of RAM.wa— percent time waiting for I/O. Above 10% on a web server deserves investigation.
iostat for Disk
iostat -xz 1 5 — extended stats, skip zero-activity devices, 1-second intervals:
iostat -xz 1 5
Device r/s w/s rkB/s wkB/s await r_await w_await %util
sda 0.50 32.00 20.00 128.00 4.32 2.10 4.50 12.40
nvme0n1 8.00 64.00 320.00 512.00 0.18 0.12 0.20 1.20%util is the most useful single number — it’s the percentage of time the device was busy. Above 80% is where you start seeing latency effects. await is average I/O time in milliseconds. Compare it to r_await and w_await to see if reads or writes are the problem.
ss for Network State
ss -s for a summary, ss -tnp for TCP connections with process names:
ss -tnp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 0 10.0.0.5:443 203.0.113.5:51234 users:(("nginx",pid=1234,fd=18))
CLOSE-WAIT 8 0 10.0.0.5:5432 10.0.0.3:44821CLOSE_WAIT connections that don’t go away mean your application isn’t closing database connections properly. I’ve diagnosed more than one memory leak by noticing the CLOSE_WAIT count climbing over hours.
/proc/meminfo for Exact Memory Accounting
free -h gives the overview, but /proc/meminfo has the detail:
grep -E 'MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|Dirty|Writeback' /proc/meminfoDirty — data written to filesystem but not yet flushed to disk. A very high dirty count under normal conditions means I/O is falling behind writes. MemAvailable is more useful than MemFree — it accounts for reclaimable cache and buffer memory.
dmesg for the Scary Stuff
When a process disappears or a server suddenly gets slow and nothing else explains it:
dmesg -T | grep -E 'OOM|killed|error|fail|warn' | tail -30OOM kills show up here before anywhere else. Hardware errors — disk read failures, memory ECC corrections — also appear in the ring buffer. I’ve found bad RAM and failing drives through dmesg before any monitoring alert fired.
What This Workflow Actually Looks Like
In practice, on a slow server I do this in order:
top— anything obviously wrong? (runaway process, memory full)vmstat 1 10— sustained pattern, any swap, I/O waitiostat -xz 1 5— which device, how badss -tnp— connection state anomaliesdmesg -T | tail -50— kernel-level events
Five commands, takes two minutes. Usually enough to identify whether the problem is CPU, memory, I/O, or network — and whether it’s hardware or software. Everything after that is narrowing down within that category.
I’ve worked on systems with full Datadog and Prometheus setups that were slower to diagnose than this because everyone was waiting for a dashboard to load instead of running two commands.