Performance Tuning a Web Server: Part 3

by Mike on February 19, 2009

in Web Server

Memory Evaluation
A web server uses processes that are each in a separate address space and isolated from other processes for security purposes.  The advantage of assigning each process a separate address space is that each process can make progress independent of the others.  If one is blocked because of an activity like waiting for a write to disk the other processes can move forward.  On SMP processors multiple processes can move forward in parallel.

Evaluate Server Performance

Disk Performance

CPU Performance

Memory Performance

I/O Performance

Network Performance

As workloads increase, memory requirements will also increase.  Some services like Apache are tied closely to the usage of memory as the first limit to performance.

There are several kinds of memory that are available on a Linux machine.  Physical memory is RAM that is located on the machine.  With the low cost of RAM, this is the best investment that you can make on a server.  Virtual memory is employed when the Physical memory begins to run low so data is moved to the hard drive.  This has several serious consequences.  First, virtual memory is 10 to 100 times slower than Physical memory so a considerable slow down will be perceivable.  Second, swapping to hard drive space increases the wear and tear on a hard drive as well as produces a considerable amount of heat.

There are several ways to see memory.  One is to view meminfo.

cat /proc/meminfo

MemTotal is the total amount of Physical memory.  MemFree is free memory on the system.  Buffers is the buffer cache that is used for I/O and cached is the memory used for reading files from disk.  SwapCached is the cache memory that was swapped out of swap space and SwapTotal is the disk space used for swapping.  HighTotal is memory greater than 860 MB while LowTotal is memory that is used by the kernel.  Mapped memory corresponds to memory mapped to files.  Slab is a reference to memory used for kernel data structures.

Xeon

MemTotal:      2075080 kB
MemFree:        346272 kB
Buffers:        318880 kB
Cached:        1025860 kB
SwapCached:          0 kB
Active:        1121580 kB
Inactive:       526460 kB
HighTotal:     1179072 kB
HighFree:       167556 kB
LowTotal:       896008 kB
LowFree:        178716 kB
SwapTotal:     5108616 kB
SwapFree:      5108508 kB
Dirty:             220 kB
Writeback:           0 kB
AnonPages:      303168 kB
Mapped:          17464 kB
Slab:            67940 kB
PageTables:       2916 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   6146156 kB
Committed_AS:   665660 kB
VmallocTotal:   114680 kB
VmallocUsed:      6288 kB
VmallocChunk:   108272 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     4096 kB

ps aux

You can evaluate which processes are taking the most memory by using the ps aux command.  There is a percentage of memory setting (%MEM), virtual memory (VSZ) and physical memory (RSS).

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0 0.0     3056  1888 ?        Ss   02:53   0:01 /sbin/init
root         2  0.0 0.0        0     0 ?        S<   02:53   0:00 [kthreadd]

Practical Application:
Evaluate the ssh daemon.  The first thing you want to do is find the process ID.

root      5220  0.0  0.0   5396  1032 ?        Ss   02:54   0:00 /usr/sbin/sshd

In the ps command you can see that sshd is using 5396 virtual memory and 1032 physical memory.  Now that you can see the process is 5220 you can see how this memory is specifically used by looking at the maps of the process.

cat /proc/5220/maps

b7a3a000-b7a44000 r-xp 00000000 08:02 8352       /lib/tls/i686/cmov/libnss_files-2.8.90.so
b7a44000-b7a45000 r–p 00009000 08:02 8352       /lib/tls/i686/cmov/libnss_files-2.8.90.so
b7a45000-b7a46000 rw-p 0000a000 08:02 8352       /lib/tls/i686/cmov/libnss_files-2.8.90.so
b7a46000-b7a4f000 r-xp 00000000 08:02 8354       /lib/tls/i686/cmov/libnss_nis-2.8.90.so
b7a4f000-b7a50000 r–p 00008000 08:02 8354       /lib/tls/i686/cmov/libnss_nis-2.8.90.so
b7a50000-b7a51000 rw-p 00009000 08:02 8354       /lib/tls/i686/cmov/libnss_nis-2.8.90.so
b7a51000-b7a58000 r-xp 00000000 08:02 8350       /lib/tls/i686/cmov/libnss_compat-2.8.90.so
b7a58000-b7a59000 r–p 00006000 08:02 8350       /lib/tls/i686/cmov/libnss_compat-2.8.90.so
b7a59000-b7a5a000 rw-p 00007000 08:02 8350       /lib/tls/i686/cmov/libnss_compat-2.8.90.so
b7a5a000-b7a5c000 rw-p b7a5a000 00:00 0
b7a5c000-b7a71000 r-xp 00000000 08:02 8378       /lib/tls/i686/cmov/libpthread-2.8.90.so
b7a71000-b7a72000 r–p 00014000 08:02 8378       /lib/tls/i686/cmov/libpthread-2.8.90.so
b7a72000-b7a73000 rw-p 00015000 08:02 8378       /lib/tls/i686/cmov/libpthread-2.8.90.so
b7a73000-b7a75000 rw-p b7a73000 00:00 0
b7a75000-b7a77000 r-xp 00000000 08:02 350263     /lib/libkeyutils-1.2.so
b7a77000-b7a79000 rw-p 00001000 08:02 350263     /lib/libkeyutils-1.2.so
b7a79000-b7a80000 r-xp 00000000 08:02 57481      /usr/lib/libkrb5support.so.0.1
b7a80000-b7a81000 r–p 00006000 08:02 57481      /usr/lib/libkrb5support.so.0.1
b7a81000-b7a82000 rw-p 00007000 08:02 57481      /usr/lib/libkrb5support.so.0.1
b7a82000-b7bda000 r-xp 00000000 08:02 8343       /lib/tls/i686/cmov/libc-2.8.90.so
b7bda000-b7bdc000 r–p 00158000 08:02 8343       /lib/tls/i686/cmov/libc-2.8.90.so
b7bdc000-b7bdd000 rw-p 0015a000 08:02 8343       /lib/tls/i686/cmov/libc-2.8.90.so

Web servers are very sensitive to memory so increase  the RAM so that you are not using SWAP, except in extreme situations.

Tuning TCP Sockets

The  tcp_max_syn_backlog sets the number of TCP SYN packets that the server will queue before they are dropped.  Here you can see the default, this can be increased to 30000.
cat /proc/sys/net/ipv4/tcp_max_syn_backlog
1024

echo 30000 > /proc/sys/net/ipv4/tcp_max_syn_backlog

With a web server you will see a lot of TCP connections in the TIME-WAIT state.  TIME_WAIT is when the socket is waiting after close to handle packets  still in the network.  This also should be increased.  Here is the default.

cat /proc/sys/net/ipv4/tcp_max_tw_buckets
180000

echo 2000000 > /proc/sys/net/ipv4/tcp_max_tw_buckets

The number of packets that can be queued should be increased from the default of 1000 to 50000.

cat /proc/sys/net/core/netdev_max_backlog
1000

echo 50000 > /proc/sys/net/core/netdev_max_backlog

These tuning features of the TCP sockets can have significant increases in speed.  Be sure to test before and after to verify these are doing what you expect.

Previous post:

Next post: