Linux Performance Tools You Already Have Installed

When a server gets slow, the instinct is to install something. A monitoring agent, a dashboard, a profiling tool. That instinct is usually wrong — not because those tools are bad, but because the answer is almost always already in the system, if you know where to look.

Here are the tools I actually reach for first.

The Basics

Every tool in this post ships with a standard Linux installation. No apt install, no pip, no containers.

Tool	What It Measures	When to Use It
`top` / `htop`	CPU, memory, process list	First look at a slow system
`vmstat`	CPU, memory, swap, I/O, context switches	Sustained load patterns over time
`iostat`	Disk I/O per device, wait times	Anything disk-bound
`ss`	Network socket state	Connection counts, port usage
`sar`	Historical CPU, memory, I/O, network	”When did this start happening?”
`dmesg`	Kernel ring buffer	Hardware errors, OOM kills
`journalctl`	Systemd logs	Service-level context
`/proc/meminfo`	Detailed memory breakdown	Exactly how memory is allocated
`/proc/net/dev`	Per-interface RX/TX stats	Network saturation
`perf stat`	Hardware counters (CPU cycles, cache misses)	CPU-bound bottlenecks

htop is technically not always installed, but it usually is. sar requires sysstat package on Debian/Ubuntu — worth installing.

vmstat for Load Patterns

vmstat 1 10 samples every second for ten intervals. The columns that matter:

vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 1842316 248784 3201840    0    0     0   124 1842 3201  8  2 90  0  0
 3  0      0 1841892 248784 3201840    0    0     0   208 1876 3318 12  3 85  0  0

r — runnable processes (waiting for CPU). Consistently above your CPU count means CPU saturation.
b — processes blocked on I/O. Non-zero here is a flag.
si/so — swap in/out. Any sustained non-zero means you’re out of RAM.
wa — percent time waiting for I/O. Above 10% on a web server deserves investigation.

iostat for Disk

iostat -xz 1 5 — extended stats, skip zero-activity devices, 1-second intervals:

iostat -xz 1 5
Device    r/s  w/s  rkB/s  wkB/s  await  r_await  w_await  %util
sda      0.50 32.00  20.00 128.00   4.32     2.10     4.50   12.40
nvme0n1  8.00 64.00 320.00 512.00   0.18     0.12     0.20    1.20

%util is the most useful single number — it’s the percentage of time the device was busy. Above 80% is where you start seeing latency effects. await is average I/O time in milliseconds. Compare it to r_await and w_await to see if reads or writes are the problem.

ss for Network State

ss -s for a summary, ss -tnp for TCP connections with process names:

ss -tnp
State   Recv-Q  Send-Q  Local Address:Port    Peer Address:Port  Process
ESTAB   0       0       10.0.0.5:443          203.0.113.5:51234   users:(("nginx",pid=1234,fd=18))
CLOSE-WAIT 8    0       10.0.0.5:5432         10.0.0.3:44821

CLOSE_WAIT connections that don’t go away mean your application isn’t closing database connections properly. I’ve diagnosed more than one memory leak by noticing the CLOSE_WAIT count climbing over hours.

/proc/meminfo for Exact Memory Accounting

free -h gives the overview, but /proc/meminfo has the detail:

grep -E 'MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|Dirty|Writeback' /proc/meminfo

Dirty — data written to filesystem but not yet flushed to disk. A very high dirty count under normal conditions means I/O is falling behind writes. MemAvailable is more useful than MemFree — it accounts for reclaimable cache and buffer memory.

dmesg for the Scary Stuff

When a process disappears or a server suddenly gets slow and nothing else explains it:

dmesg -T | grep -E 'OOM|killed|error|fail|warn' | tail -30

OOM kills show up here before anywhere else. Hardware errors — disk read failures, memory ECC corrections — also appear in the ring buffer. I’ve found bad RAM and failing drives through dmesg before any monitoring alert fired.

What This Workflow Actually Looks Like

In practice, on a slow server I do this in order:

top — anything obviously wrong? (runaway process, memory full)
vmstat 1 10 — sustained pattern, any swap, I/O wait
iostat -xz 1 5 — which device, how bad
ss -tnp — connection state anomalies
dmesg -T | tail -50 — kernel-level events

Five commands, takes two minutes. Usually enough to identify whether the problem is CPU, memory, I/O, or network — and whether it’s hardware or software. Everything after that is narrowing down within that category.

I’ve worked on systems with full Datadog and Prometheus setups that were slower to diagnose than this because everyone was waiting for a dashboard to load instead of running two commands.