An open API service indexing awesome lists of open source software.

https://github.com/ameshkov/circumvention-monitor

Circumvention ad requests monitor
https://github.com/ameshkov/circumvention-monitor

ad-blocking filter-lists

Last synced: about 1 month ago
JSON representation

Circumvention ad requests monitor

Awesome Lists containing this project

README

        

# Circumvention monitor

[![Build Status](https://travis-ci.com/ameshkov/circumvention-monitor.svg?branch=master)](https://travis-ci.com/ameshkov/circumvention-monitor)

There's a typical issue with ad networks that often switch to using random new domains, and it's hard to keep an eye on them.
This crawler is supposed to automate this process.

- [Circumvention monitor](#circumvention-monitor)
- [Reports](#reports)
- [How to configure it](#how-to-configure-it)
- [How to run it](#how-to-run-it)
- [TODO](#todo)

## Reports

Every day the circumvention monitor runs automatically and generates two files:

- [report/report.md](report/report.md) - human-readable report.
- [report/rules.txt](report/rules.txt) - blocking rules for the domains discovered by the crawler.

## How to configure it

In order to add a new ad system to monitor, add a new JS object to the [configuration](conf/configuration.json).

```json
{
"name": "AD SYSTEM NAME",
"criteria": [
{
"urlPattern": "URL PATTERN",
"contentPattern": "CONTENT PATTERN",
"contentType": "script",
"thirdParty": true,
"ruleProperties": {
"modifiers": ["third-party"],
"scope": "registeredDomain"
}
}
],
"pages": ["https://example.net/", "https://example.com/"]
}
```

- `name` - ad system name. Will be used in the report to identify this ruleset.
- `criteria` - a list of criteria that will be used to identify ad requests.

- `urlPattern` _(optional)_ - ad request URL must match this pattern. It can be a string, a wildcard, or a regular expression.

Examples:

- `test` - string, all URLs that contain this string.
- `*test*test*` - wildcard, the URL must match this wildcard.
- `/.*test.*/` - regular expression. Note that `/` are just special characters and not a part of the regular expression.

- `contentPattern` _(optional)_ - response content must match this pattern. Just like `urlPattern`, it can be a string, a wildcard, or a regular expression.
- `contentType` _(optional)_ - one of this [list](https://github.com/puppeteer/puppeteer/blob/v3.0.2/docs/api.md#requestresourcetype).
- `thirdParty` _(optional)_ - if specified, we check if request is third party or not.

- `ruleProperties` _(optional)_ - additional propreties for the rules generated by the compiler.

- `modifiers` _(optional)_ - an array of modifiers that should be added to the rule

- `scope` _(optional)_ - rule scope. Possible values are:
- `domain` - full domain name (`||exact.domain.name^`)
- `registeredDomain` - registered domain name (eTLD+1) (`||domain.name^`)
- `domainAndPath` - domain + path (`||exact.domain.name/path/without/query`)

- `pages` - a list of webpages that will be crawled in order to extract this ad system domains.

## How to run it

- `yarn install` - install dependencies
- `yarn monitor` - run the crawler with default arguments

Run `yarn monitor -v` to make it print the verbose log.

## TODO

- [x] Make basic rules modifiers configurable (see report.js)
- [ ] Allow monitoring DOM state (I need examples where this is needed)
- [ ] "criteria" should allow blocking or adding custom CSS to test pages so that we could trigger circumvention scripts