· Jonathan Cutrer · Engineering  · 4 min read

Linux Performance Tools You Already Have Installed

Before you reach for Datadog or htop, there's a whole set of built-in utilities that tell you exactly what your system is doing.

Before you reach for Datadog or htop, there's a whole set of built-in utilities that tell you exactly what your system is doing.

When a server gets slow, the instinct is to install something. A monitoring agent, a dashboard, a profiling tool. That instinct is usually wrong — not because those tools are bad, but because the answer is almost always already in the system, if you know where to look.

Here are the tools I actually reach for first.

The Basics

Every tool in this post ships with a standard Linux installation. No apt install, no pip, no containers.

ToolWhat It MeasuresWhen to Use It
top / htopCPU, memory, process listFirst look at a slow system
vmstatCPU, memory, swap, I/O, context switchesSustained load patterns over time
iostatDisk I/O per device, wait timesAnything disk-bound
ssNetwork socket stateConnection counts, port usage
sarHistorical CPU, memory, I/O, network”When did this start happening?”
dmesgKernel ring bufferHardware errors, OOM kills
journalctlSystemd logsService-level context
/proc/meminfoDetailed memory breakdownExactly how memory is allocated
/proc/net/devPer-interface RX/TX statsNetwork saturation
perf statHardware counters (CPU cycles, cache misses)CPU-bound bottlenecks

htop is technically not always installed, but it usually is. sar requires sysstat package on Debian/Ubuntu — worth installing.

vmstat for Load Patterns

vmstat 1 10 samples every second for ten intervals. The columns that matter:

vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 1842316 248784 3201840    0    0     0   124 1842 3201  8  2 90  0  0
 3  0      0 1841892 248784 3201840    0    0     0   208 1876 3318 12  3 85  0  0
  • r — runnable processes (waiting for CPU). Consistently above your CPU count means CPU saturation.
  • b — processes blocked on I/O. Non-zero here is a flag.
  • si/so — swap in/out. Any sustained non-zero means you’re out of RAM.
  • wa — percent time waiting for I/O. Above 10% on a web server deserves investigation.

iostat for Disk

iostat -xz 1 5 — extended stats, skip zero-activity devices, 1-second intervals:

iostat -xz 1 5
Device    r/s  w/s  rkB/s  wkB/s  await  r_await  w_await  %util
sda      0.50 32.00  20.00 128.00   4.32     2.10     4.50   12.40
nvme0n1  8.00 64.00 320.00 512.00   0.18     0.12     0.20    1.20

%util is the most useful single number — it’s the percentage of time the device was busy. Above 80% is where you start seeing latency effects. await is average I/O time in milliseconds. Compare it to r_await and w_await to see if reads or writes are the problem.

ss for Network State

ss -s for a summary, ss -tnp for TCP connections with process names:

ss -tnp
State   Recv-Q  Send-Q  Local Address:Port    Peer Address:Port  Process
ESTAB   0       0       10.0.0.5:443          203.0.113.5:51234   users:(("nginx",pid=1234,fd=18))
CLOSE-WAIT 8    0       10.0.0.5:5432         10.0.0.3:44821

CLOSE_WAIT connections that don’t go away mean your application isn’t closing database connections properly. I’ve diagnosed more than one memory leak by noticing the CLOSE_WAIT count climbing over hours.

/proc/meminfo for Exact Memory Accounting

free -h gives the overview, but /proc/meminfo has the detail:

grep -E 'MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|Dirty|Writeback' /proc/meminfo

Dirty — data written to filesystem but not yet flushed to disk. A very high dirty count under normal conditions means I/O is falling behind writes. MemAvailable is more useful than MemFree — it accounts for reclaimable cache and buffer memory.

dmesg for the Scary Stuff

When a process disappears or a server suddenly gets slow and nothing else explains it:

dmesg -T | grep -E 'OOM|killed|error|fail|warn' | tail -30

OOM kills show up here before anywhere else. Hardware errors — disk read failures, memory ECC corrections — also appear in the ring buffer. I’ve found bad RAM and failing drives through dmesg before any monitoring alert fired.

What This Workflow Actually Looks Like

In practice, on a slow server I do this in order:

  1. top — anything obviously wrong? (runaway process, memory full)
  2. vmstat 1 10 — sustained pattern, any swap, I/O wait
  3. iostat -xz 1 5 — which device, how bad
  4. ss -tnp — connection state anomalies
  5. dmesg -T | tail -50 — kernel-level events

Five commands, takes two minutes. Usually enough to identify whether the problem is CPU, memory, I/O, or network — and whether it’s hardware or software. Everything after that is narrowing down within that category.

I’ve worked on systems with full Datadog and Prometheus setups that were slower to diagnose than this because everyone was waiting for a dashboard to load instead of running two commands.

Back to Blog

Related Posts

View All Posts »
The Tools I Actually Use in 2026

The Tools I Actually Use in 2026

Not a gear list for the algorithm. Just the software and setup I actually reach for every day — with honest notes on what I'd change.