mod_dnsbl 0.12

Internet content management for the rest of us. Together with mod_clamav, and mod_authz_ldap, mod_dnsbl provides a fairly complete solution for web content management. We even provide you a URL database for testing, but please don't use it in production.

The Problem

Many corporations want or are almost forced by the legal situtation in their countries to censor the internet access of their employees. E.g. allowing access to pornographic sites may be construed as sexual harassment. For this reason, most administrators have their time proven sets of regular expressions to match against the URI. However, there are some problems with this approach:

  1. Duplication of work: Censoring isn't really fun work, so many administrators duplicate work that has already been done. But since different companies usually have different policies, pattern files cannot simply be shared. Furthermore, many organisations have several proxies, which means that rule files must be distributed do several hosts.
  2. Unsuspecting URLs: Many sites nowadays hide behind names that don't really tell anything about their contents. Such web site names lead to long regular expression lists, which are hard to maintain.
  3. False positives: a pattern that matches sex, also matches msexchange. While open source advocates don't really mind if this particular string is blocked, the reason is quite different from the reason behind the original pattern.
  4. Administrative overhead: usually, the people making the censoring decisions are nontechnical people, the technicians will then have to implement their decisions. An administrator may be tempted to write a web based GUI for this purpose.

In a way, the problem is similar to lists of notorious spammers administrators have come to maintain.

A Solution

A solution to this problem would have the following features:

Of course, this sounds very much like a description of the DNS. And this isn't a new idea, of course, spam fighters already use the DNS as a distributed database to detect open relays or spam outfits.

The mod_dnsbl apache module proposes to extend the idea of DNS blacklists to website classification. Here is how it works.

The idea to also place the catalog in the DNS was rejected because a simple mapping from IP addresses to host names would block part of the namespace 127.in-addr.arpa for a very special purpose, which is not acceptable.

There are several performance features that make this solution technically more compelling than e.g. the Websense product. Large sites can easily do a zone transfer of the zone. Using a cache only DNS server, the round trip time for a site can be reduced to a local DNS round trip. Most systems nowadays have a naming services cache daemon that is even faster. These caches often also do negative caching, so that lookups for hosts that are not classified only seldom suffer a penalty.

The administration can also be simplified. Nowadays most sites have fancy GUI tools to do their DNS administration, they can also be given to nontechnical people. If you define canonical names for the category titles, the content administrator can just enter CNAME records for the sites he wants blocked. He does not need access to any of the proxies to activate the filtering.

An implementation

The Apache mod_dnsbl is an implementation of this idea. It is distributed under the Apache license, and can be downloaded from http://software.othello.ch/mod_dnsbl/mod_dnsbl-0.12.tar.gz. It is installed as most other modules:

./configure --with-apxs=/usr/local/apache/bin/apxs
make
make install
The module is added to an Apache based proxy, and configured as follows:
DnsblSuffix	dnsbl.othello.ch
DnsblContact	webmaster@yourdom.ain
DnsblTemplate	/usr/local/apache/htdocs/dnsbl/template.html
DnsblDefaultAction	pass
# actions for squidguard lists
DnsblAction     127.0.0.1       block   Advertising
DnsblAction     127.0.0.2       block   Aggressive
DnsblAction     127.0.0.3       block   Audio-Video
DnsblAction     127.0.0.4       block   Drugs
DnsblAction     127.0.0.5       block   Gambling
DnsblAction     127.0.0.6       block   Hacking
DnsblAction     127.0.0.7       block   Proxy
DnsblAction     127.0.0.8       block   Violence
DnsblAction     127.0.0.9       block   "Illegal Software"
DnsblAction     127.0.0.10      pass    Mail
DnsblAction     127.0.0.11      block   Porn

# actions for adult list
DnsblAction	127.0.0.12	block	Adult

# our own actions
DnsblAction	127.1.0.1	skipauth	Open
DnsblAction	127.1.0.2	noscan		Virus-free

DnsblRecursionDepth	4
Please check the Apache documentation for the proxy configuraiton.

The implementation as an Apache module adds additional functionality not present in the regular expression list approach:

  1. We can allow access to certain resources without authentication.
  2. We can block or pass resources only at certain times during the day.
  3. We can control whether resources should be scanned or not.

Squidguard Blacklists

The squidguard project has generated nice blacklists, and the mod_dnsbl distribution includes a script blacklist2zone that converts the squidguard blacklists to a DNS zone. This zone is available under the suffix dnsbl.othello.ch. The sample configuration includes actions for all ip addresses used for squidguard categories.

Another very large blacklist can be found on http://cri.univ-tlse1.fr/documentations/cache/squidguard_en.html, it is particularly rich in adult urls. The blacklist2zone script can include this list also in the same zone.

Since the squidgard blacklists also include some URLs that cannot be processed by mod_dnsbl, we have added edited versions of these lists to the distribution directory:

To use them, extract the files into the contrib directory, modify the blacklist2zone script with the names of your name servers and import the zone file generated by blacklist2zone into your name servers.

You may use the zone dnsbl.othello.ch on timon.othello.ch for testing only. That zone is not kept current on purpose (to make it unattractive for production use and to protect my rather weak internet link).

Squid Redirector

Also included with the distribution is a squid redirector with the same functionality the mod_dnsbl module. Please consult the manual page dnsbl_redirector(1) for details. This redirector is currently not as functional as the Apache module, it does not understand anything about authentication and viruses of course.

Apache Server Versions

Apache 1.3.x has been replaced by Apache 2.x a long time ago, so it makes no sense any longer to support Apache 1.3.x in such a small module. Therefore, starting with release 0.11, only Apache 2.2.x is supported (this makes the code somewhat cleaner and easier to maintain).

How to use dnsbl.othello.ch

Note that the zone dnsbl.othello.ch is not to be used for for production. This zone has not been updated in the last 5 years, and it will not be updated in the forseable future. This is painfully evident from the zone's SOA record. Please create your own local DNS zone, with your own import of the blacklisted URLs. A very convenient way to do this is to use the PowerDNS server. It allows you to use a MySQL or other database backend, this makes it easy to add your own URLs on the fly without the need to restart the server.

Please do not use dnsbl.othello.ch in production. It has happened that the traffic generated by installations using my blacklist zone in production has consumed more than 50% of my upstream bandwidth. If you use this in production, I'll add your DNS servers to a black list, and my DNS server will return a blocking IP address for all queries from your server. This means that all URLs will be blocked. Completely disrupting your proxy service. Since no provider ever cooperated in finding and educating abusers of dnsbl.othello.ch, I was unfortunately forced to use these somewhat draconian measures.

Configuration Reference

DnsblSuffix

Syntax: DnsblSuffix dnsbl.dom.ain ...
Default: none
Context: server config

This directive sets the DNS suffixes within which one should look for a given host name. The suffixes are later checked in that order, i.e. if some suffix turns up a classification which leads to a pass rating, no later domain can interfere with that. This can be used to override classifications some public DNS provides with a private DNS.

DnsblContact

Syntax: DnsblContact content@dom.ain
Default: none
Context: server config

This directive sets the Email address that is inserted into the error page the informs the user about the fact that she is not allowed to view the page.

DnsblTemplate

Syntax: DnsblTemplate /path/to/file.html
Default: none
Context: server config

The file /path/to/file.html is used as a template to inform the user that and why she is not allowed to view a page. The following replacements are performed before the page is sent to the user: '%%' is replaced by '%', '%u' is replaced by the requested URL, '%r' is replace by the reason, and '%c' is replaced by the contact address specified with DnsblContact.

DnsblDefaultAction

Syntax: DnsblDefaultAction { action }
Default: pass
Context: server config

By default, the module allows all requests. However, in some applications it might be desirable to use the system as a whitelist, in which case the default action should be block. See below for possible actions.

DnsblAction

Syntax: DnsblAction ip { action } [ reason ]
Default: none
Context: server config

This directive adds a blocking (or passin rule) to the rule table of the module. If the DNS query returns ip, the module will react according to the second argument. The block page displayed will include the string given as third argument for the reason, or the IP address if no reason is specified. See below for possible actions.

DnsblAuthoritative

Syntax: DnsblAuthoritative { on | off }
Default: off
Context: server config

This directive turns mod_dnsbl into a dummy authentication module, which accepts every user as authenticated provided the action for the URL is skipauth. This allows to build proxies that accept connections to some sites without authentication, while others still require authentication.

DnsblRecursionDepth

Syntax: DnsblRecursionDepth depth
Default: 0
Context: server config

This directs mod_dnsbl to also analyze URLs. If a name returns the IP address 127.255.255.255, then the module will map at most depth components of the URL path to a DNS name and will try to find an action in the DNS. If set to 0, the URL path matching is done.

DnsblMessage

Syntax: DnsblMessage message string
Default: none
Context: server config

Use the contents of this string as a template for the block page. The same replacements as with the DnsblTemplate directive are used.

Actions

With every ip address, we can associate an action string. An action string consists of comma separated rules what the module should do with the request. Each rule is composed of an action verb and a time specification. E.g. the following action string

pass/12:00-14:00,noscan
means that this resource should be passed between 12:00 and 14:00 local time, and nothing should be scanned for viruses. Or
block/08:00-17:00,pass,scan
means that the resource should be blocked during office hours, and after hours, everything should be scanned for viruses, even items that are normally considered safe. Note that time specifications are always in the form HH:MM-HH:MM.

The following action verbs are known to the module:

block
block this resource unconditinally.
pass
pass this resource
skipauth
if the resource is passed, then also don't ask for authentication.
scan
Always perform virus scanning, even if mod_clamav would normally not scan this resource. This allows to make the module scan images normally deemed harmless if the come from certain sites.
noscan
Don't scan this resource for viruses. This may be necessary in cases where downloading virus patterns through a proxy may trigger the virus scanner.

© 2003-2011 Prof. Dr. Andreas Müller, Beratung und Entwicklung and Prof. Dr. Andreas Müller