Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/opencoff/unbound-adblock

Generate ad-serving and malware list for unbound
https://github.com/opencoff/unbound-adblock

adblock adblock-list adblocking-dns malware-domains unbound unbound-server

Last synced: about 1 month ago
JSON representation

Generate ad-serving and malware list for unbound

Awesome Lists containing this project

README

        

===============================================
Script to generate Ad-block domains for unbound
===============================================

Take a list of known malware and ad-serving domains and generate an
amalgamated configuration file **fragment** for unbound_. This fragment when
included in the main body of *unbound.conf*, will block these hosts and
domains serving malware and/or intrusive ads.

Usage
-----
You will need GNU Make (any recent version). And a recent golang toolchain
(>1.11). Assuming GNU Make is available as ``gmake``, type::

gmake

This will generate two config file fragments for unbound_:

- *bad-hosts.conf*: Config file fragment with a few trackers; the list of
blocklist items are in *myfeed.txt*
- *big.conf*: Very large list of blocklist domains and hosts (~30MB, ~700k
entries). The blocklist feed comes from *bigfeed.txt* (auto-generated).

Include one of the config files (*bad-hosts.conf* or *big.conf*) in your *unbound.conf*
as follows::

# include auto-generated ad-block/malware list
include: /path/to/bad-hosts.conf

And reload unbound config to use the new blocklist.

Details
-------
The blocklist is generated by a golang program in the `blgen` directory. It is
built using the shell script `build`. The output binary is put in a platform
specific directory (`bin/$os-$arch/blgen`). Usage::

blgen [options] [blocklist ...]

Read one or more blocklist files and generate a composite file containing
blocked hosts and domains. The final output is written to STDOUT or to
an output file.

blgen can optionally read a feed (txt file) of well known 3rd party malware and
tracker URLs. The feed.txt is a simple file:
- Each line starts with either a 'txt' or 'json' followed by a URL.
- The keyword 'txt' or 'json' identifies the type of output returned by the URL

Example:

txt http://pgl.yoyo.org/files/adhosts/plaintext
txt http://mirror2.malwaredomains.com/files/justdomains

Options:
-c, --cache-dir D Use 'D' as the cache directory ["."]
-F, --feed F Read blocklists from feed file 'F' [""]
--no-cache Ignore the cache and re-fetch every blocklist [False]
-o, --output-file F Write output to file 'F' [""]
-f, --output-format T Set output format to 'T' (text or unbound) [""]
-v, --verbose Show verbose output [false]
-W, --allowlist F Add whistlist entries from file 'F' [[]]

The `-W` flag can be used multiple times to add multiple allow list sources.

Caching
~~~~~~~
`blgen` caches the downloaded blocklists and only refreshes it once a day.
In the default invocation of `blgen` in *GNUmakefile*, the
cache-dir is the current directory. Each cache file uses the URL as the prefix
and a truncated SHA256 sum of the URL as the suffix. The cache can be ignored
via the `--no-cache` option.

.. _unbound: https://unbound.net/

Guide to source code
====================
The go program is organized as follows:

- *internal/blgen*: contains the implementation of the blocklist DB,
fetching host-lists etc.
- *blgen/*: contains the driver program ("main()") along with a few helper
routines to generate the output.

.. vim: ft=rst:sw=4:ts=4:expandtab:tw=78: