Squid Proxy: Introduction
Server Training - Proxy Server

Squid is a caching proxy server that can provide enhanced performance for HTTP and FTP. Squid will cache commonly accessed sites so that it can improve performance by 10-20% for Internet connections.

squid proxy

Here is the Official Squid Site: http://www.squid-cache.org/

Squid is compliant to the Harvest Cache architecture and uses the Inter-Cache Protocol (ICP) to transfer data between peer and /parent/child servers. Squid can accelerate traffic from the inside network to the Internet or it can be employed to act as a front-end accelerator for a Web server, increasing access to the web pages on the server.

Here is what Squid can do:

  1. Accelerate Internet Connections for Internal Network

  2. Protect the Internal Network When Surfing the Internet

  3. Create Detailed Information About User Activity on the Internet

  4. Prevent Inappropriate Activity by Users on the Internet

  5. Enforce Use by Authorized Users Only

  6. Filter Sensitive Material

  7. Accelerate Web Server Pages

Squid acts both as a proxy, working in behalf of a user, and as a cache. When squid works as a proxy and a user makes a request for a web site, squid retrieves the web page and then provides it to the user. The user, in reality never reaches the Internet as the proxy server retrieves and caches all the sites the user makes requests for.

 

Important Locations

Once you install Squid, you will need to be familiar with these locations that are important for Squid.

 

/etc/rc.d/init.d/squid

start/stop/status

/etc/squid

config directory

/usr/share/doc/squid-2.6.STABLE6

html documentation

/usr/lib/squid

support files

/usr/sbin/squid

squid daemon

/var/log/squid

log file

/var/spool/squid

cache directory

 

Set Squid to Run at Startup(CentOS 5.2)

Use the chkconfig command to set Squid to run each time the server is started.

chkconfig - - level 35 squid on

This will cause Squid to run whether the server starts in runlevel 3 or 5.

Start / Stop / Restart / Status(CentOS)

Squid command is located at /etc/rc.d/init.d/squid. You can use these 4 commands to start, stop, restart or check status. Use the full path like so:

/etc/rc.d/init.d/squid start

or

Use the service command as root:

service squid stop

Start / Stop / Restart / Status(Ubuntu )

Squid command is located at /etc/init.d/squid. You can use these commands to start, stop, restart or check status. Use the full path like so:

/etc/init.d/squid start

Hardware Requirements

The hardware requirements are not as large as you would think. The most important aspect to consider is the RAM that is available for Squid. RAM is important because each object in the cache requires a small amount of memory. Generally, 32 MB of RAM are required for every GB of disk space. If you run out of memory there will be a significant reduction in speed.

The other major consideration for Squid is disks. The faster the disk read and write the faster Squid will operate. Usually it is a good idea to consider SCSI for disks on a proxy server just because of speed. The other advantage that SCSI has is that it can access 7 different drives allowing for multiple reads and writes without a slowdown in access. If you are using ATA drives and have multiple drives on one channel you will find the system has to wait as it can only access one drive at a time. However, SATA drives or even some ATA drives are increasing in speeds and are much cheaper.

There are a number of variables that impact the speed of Squid and the hardware that is required. One variable is object size. The larger the object, the more memory is required per object so this may increase memory requirements. The second variable is the number of users that are on the system concurrently. This is a large variable in that the difference between 5 users and 105 users is considerable. The point is, plan for growth and estimate high for concurrent users so you do not need to come back later and upgrade.

Web Caching

Web caching is when the server stores web pages and images that have been accessed by clients for future Internet requests. If a user accesses a web site like cnn.com those pages are saved, or cached so that when the next user accesses cnn.com the pages are delivered from the cache not from ccn.com. Of course, the Squid server verifies that the pages have not changed since it stored those pages initially.

When viewing logs you will see several terms that need to be understood so you know what is happening on the Squid box. The term cache hit is used when the page that was requested actually came from the cache. The cache hit ratio is the percentage of requests have have been filled from cache. The byte hit ration indicates the volume of data that was filled from the cache.

A cache miss means that the request could not be filled from the cache but had to be filled with an actual connection to the web page.

The term uncachable refers to data that could not be cached, either because the instructions from the web server accessed tells Squid not to cache the data or because the settings in Squid itself are set not to cache the specific data format that was requested. For example Squid may be set not to cache large file formats like a Quicktime movie.

Cache validation refers to the testing of the data so that Squid provides information that is current and not stale information. Often before providing a web page Squid will verify the information and replace it if it is out of date. The way that Squid will verify the information is that each time it saves data to the cache a timestamp is placed on it. This use of a timestamp maintains the integrity of updated information.