Squid (web browsing) circuit

Web browsing involves the following subsystems:

  1. The squid proxy cache: Used for caching web pages. It can also be used for logging and restring access to certain pages.


    1. Caching web pages: When a user requests a Web page for the first time, squid fetches it from the original web server, and caches it on disk. If another request is made, squid supplies the page from its cache, thus saving a slow request to an external web site. N.B. Even for accesses served out of its cache, squid still sends a query to verify the last-modified date of the Web page to the Web server. Such a date verification query is significantly faster than a full query. Thus, no modifications are lost, while still keeping the advantage of having a cache.
    2. Logging: Page URLs, IP addresses of requesting machines, and user name are logged. Only the resource part of the URL (up to the question mark) is logged, query parameters (if any) are not logged, for privacy reasons.
    3. Access control: It is possible, using ltnb10's webmin to restrict access for certain users or to certain pages. For technical reasons, URLs which contain raw IP addresses (such as for example rather than http://www.pt.lu) are restricted if this address cannot be resolved back to an hostname. This is done to prevent users from connecting to blocked web sites by manually supplying the IP address rather than the name. Legitimate sites which use such addresses are extremely rare; if this happens with legitimate sites, it is usually due to a misconfiguration at the remote site, which is usually resolved within days.
    1. Local browser: Squid serves web pages to the local browsers; when a user logs in from a Windows Workstation in the classrooms, it asks for a proxy password to authenticate the user.
    2. Samba: As squid has no direct access to the password database, it cannot itself verify the username and password. Instead it connects to samba, and attempts to mount the proxyauth share from ltnb0. If this succeeds, it meeds that the password is correct, and the user is granted access. This action is performed by the /usr/bin/smb_auth and /usr/bin/smb_auth.sh scripts.
    3. Identd: If the user connects from a Unix workstation (athos/aramis/portos/torr), password authentication is not necessary, as Unix has a service (identd) to query for the owner of a network connection.
    4. DNS: Squid uses DNS to resolve the external web server's names to IP addresses (and vice versa, if the user supplied an IP address).
    5. Remote Web server: Squid communicates with the remote web server to fetch its pages, and to verify the dates of already cached pages.
  2. The Apache webserver

    This contains the school's own webpage, and the pages of the students. Moreover, it contains the browser's "automatic" proxy configuration, which is stored in /home/admin/public_html/proxy.pac. Moreover it serves empty images/empty java scripts for certain blocked ad servers (such as ad.doubleclick.net). This block works by having the DNS answer www.ltnb.lu's rather than advertisement server's real address. These advertisements are blocked for two reasons:

    Configuration of this feature is in /etc/httpd/spamcontrol.conf (on ltnb0) and /etc/named.conf (on ltnb10).
  3. Firewall The firewall filters any direct connection from browsers to outside servers, in order to prevent users from changing their browser config to directly go to the remote site rather than through the squid.
  4. The local browser Connects to apache (local pages, proxy config, ad redirects) and squid (remote pages).
  5. Samba, identd Used for user authentication
  6. DNS Supplies IP-hostname mappings of remote sites to squid.

Needed cisco ports

The followin ports need to be open in the Cisco for outside access: