Internet content management for the rest of us. Together with mod_clamav, and mod_authz_ldap, mod_dnsbl provides a fairly complete solution for web content management. We even provide you a URL database for testing, but please don't use it in production.
Many corporations want or are almost forced by the legal situtation in their countries to censor the internet access of their employees. E.g. allowing access to pornographic sites may be construed as sexual harassment. For this reason, most administrators have their time proven sets of regular expressions to match against the URI. However, there are some problems with this approach:
sex
,
also matches msexchange. While open source advocates don't really mind if
this particular string is blocked, the reason is quite different from
the reason behind the original pattern.In a way, the problem is similar to lists of notorious spammers administrators have come to maintain.
A solution to this problem would have the following features:
Of course, this sounds very much like a description of the DNS. And this isn't a new idea, of course, spam fighters already use the DNS as a distributed database to detect open relays or spam outfits.
The mod_dnsbl
apache module proposes to extend the idea
of DNS blacklists to website classification. Here is how it works.
dnsbl.othello.ch
(with name server on dnsbl.othello.ch),
is used as the base for all queries.
A url like www.dom.ain
is stored as
www.dom.ain.dnsbl.othello.ch
in the database together
with an IP address in the 127.0.0.0 network indicating its classification.
There is a catalog what each IP address means.127.in-addr.arpa
for a very special
purpose, which is not acceptable.
There are several performance features that make this solution technically more compelling than e.g. the Websense product. Large sites can easily do a zone transfer of the zone. Using a cache only DNS server, the round trip time for a site can be reduced to a local DNS round trip. Most systems nowadays have a naming services cache daemon that is even faster. These caches often also do negative caching, so that lookups for hosts that are not classified only seldom suffer a penalty.
The administration can also be simplified. Nowadays most sites have fancy GUI tools to do their DNS administration, they can also be given to nontechnical people. If you define canonical names for the category titles, the content administrator can just enter CNAME records for the sites he wants blocked. He does not need access to any of the proxies to activate the filtering.
The Apache mod_dnsbl
is an implementation of this idea.
It is distributed under the Apache license, and can be downloaded from
http://software.othello.ch/mod_dnsbl/mod_dnsbl-0.12.tar.gz.
It is installed as most other modules:
./configure --with-apxs=/usr/local/apache/bin/apxs make make installThe module is added to an Apache based proxy, and configured as follows:
DnsblSuffix dnsbl.othello.ch DnsblContact webmaster@yourdom.ain DnsblTemplate /usr/local/apache/htdocs/dnsbl/template.html DnsblDefaultAction pass # actions for squidguard lists DnsblAction 127.0.0.1 block Advertising DnsblAction 127.0.0.2 block Aggressive DnsblAction 127.0.0.3 block Audio-Video DnsblAction 127.0.0.4 block Drugs DnsblAction 127.0.0.5 block Gambling DnsblAction 127.0.0.6 block Hacking DnsblAction 127.0.0.7 block Proxy DnsblAction 127.0.0.8 block Violence DnsblAction 127.0.0.9 block "Illegal Software" DnsblAction 127.0.0.10 pass Mail DnsblAction 127.0.0.11 block Porn # actions for adult list DnsblAction 127.0.0.12 block Adult # our own actions DnsblAction 127.1.0.1 skipauth Open DnsblAction 127.1.0.2 noscan Virus-free DnsblRecursionDepth 4Please check the Apache documentation for the proxy configuraiton.
The implementation as an Apache module adds additional functionality not present in the regular expression list approach:
The squidguard project has generated
nice blacklists, and the mod_dnsbl
distribution includes
a script blacklist2zone
that converts the squidguard
blacklists to a DNS zone. This zone is available under the
suffix dnsbl.othello.ch
. The sample configuration includes
actions for all ip addresses used for squidguard categories.
Another very large blacklist can be found on
http://cri.univ-tlse1.fr/documentations/cache/squidguard_en.html, it
is particularly rich in adult urls. The blacklist2zone
script
can include this list also in the same zone.
Since the squidgard blacklists also include some URLs that cannot be processed
by mod_dnsbl
, we have added edited versions of these lists
to the distribution directory:
contrib
directory, modify the blacklist2zone
script with the
names of your name servers and import the zone file generated by
blacklist2zone
into your name servers.
You may use the zone dnsbl.othello.ch
on timon.othello.ch
for testing only. That zone is not kept current on purpose (to make it
unattractive for production use and to protect my rather weak internet
link).
Also included with the distribution is a
squid redirector with the
same functionality the mod_dnsbl
module. Please consult
the manual page dnsbl_redirector(1)
for details. This redirector is currently not as functional as the
Apache module, it does not understand anything about authentication and
viruses of course.
Apache 1.3.x has been replaced by Apache 2.x a long time ago, so it makes no sense any longer to support Apache 1.3.x in such a small module. Therefore, starting with release 0.11, only Apache 2.2.x is supported (this makes the code somewhat cleaner and easier to maintain).
Note that the zone dnsbl.othello.ch
is not to be used
for for production. This zone has not been updated in the last
5 years, and it will not be updated in the forseable future. This
is painfully evident from the zone's SOA record. Please create
your own local DNS zone, with your own import of the blacklisted
URLs. A very convenient way to do this is to use the PowerDNS server.
It allows you to use a MySQL or other database backend, this makes
it easy to add your own URLs on the fly without the need to restart
the server.
Please do not use dnsbl.othello.ch
in production. It has happened
that the traffic generated by installations using my blacklist zone
in production has consumed more than 50% of my upstream bandwidth. If you
use this in production, I'll add your DNS servers to a black list,
and my DNS server will return a blocking IP address for all
queries from your server. This means that all URLs will be blocked.
Completely disrupting your proxy service. Since no provider ever
cooperated in finding and educating abusers of dnsbl.othello.ch
,
I was unfortunately forced to use these somewhat draconian measures.
This directive sets the DNS suffixes within which one should look for a given host name. The suffixes are later checked in that order, i.e. if some suffix turns up a classification which leads to a pass rating, no later domain can interfere with that. This can be used to override classifications some public DNS provides with a private DNS.
This directive sets the Email address that is inserted into the error page the informs the user about the fact that she is not allowed to view the page.
The file /path/to/file.html is used as a template to inform
the user that and why she is not allowed to view a page.
The following replacements are performed before the page is sent to
the user: '%%' is replaced by '%', '%u' is replaced by the requested
URL, '%r' is replace by the reason, and '%c' is replaced by the
contact address specified with DnsblContact
.
By default, the module allows all requests. However, in some applications it might be desirable to use the system as a whitelist, in which case the default action should be block. See below for possible actions.
This directive adds a blocking (or passin rule) to the rule table of the module. If the DNS query returns ip, the module will react according to the second argument. The block page displayed will include the string given as third argument for the reason, or the IP address if no reason is specified. See below for possible actions.
This directive turns mod_dnsbl
into a dummy authentication
module, which accepts every user as authenticated provided the action for
the URL is skipauth
.
This allows to build proxies that accept connections to some sites
without authentication, while others still require authentication.
This directs mod_dnsbl
to also analyze URLs. If a name
returns the IP address 127.255.255.255, then the module will
map at most depth components of the URL path to a DNS name
and will try to find an action in the DNS. If set to 0, the URL path
matching is done.
Use the contents of this string as a template for the block page.
The same replacements as with the DnsblTemplate
directive are used.
With every ip address, we can associate an action string. An action string consists of comma separated rules what the module should do with the request. Each rule is composed of an action verb and a time specification. E.g. the following action string
pass/12:00-14:00,noscanmeans that this resource should be passed between 12:00 and 14:00 local time, and nothing should be scanned for viruses. Or
block/08:00-17:00,pass,scanmeans that the resource should be blocked during office hours, and after hours, everything should be scanned for viruses, even items that are normally considered safe. Note that time specifications are always in the form HH:MM-HH:MM.
The following action verbs are known to the module: