https://github.com/ameshkov/circumvention-monitor
Circumvention ad requests monitor
https://github.com/ameshkov/circumvention-monitor
ad-blocking filter-lists
Last synced: about 1 month ago
JSON representation
Circumvention ad requests monitor
- Host: GitHub
- URL: https://github.com/ameshkov/circumvention-monitor
- Owner: ameshkov
- License: mit
- Created: 2020-04-27T17:41:41.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-06T05:11:09.000Z (over 2 years ago)
- Last Synced: 2025-03-24T17:55:14.618Z (about 2 months ago)
- Topics: ad-blocking, filter-lists
- Language: JavaScript
- Homepage:
- Size: 1.19 MB
- Stars: 5
- Watchers: 3
- Forks: 1
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Circumvention monitor
[](https://travis-ci.com/ameshkov/circumvention-monitor)
There's a typical issue with ad networks that often switch to using random new domains, and it's hard to keep an eye on them.
This crawler is supposed to automate this process.- [Circumvention monitor](#circumvention-monitor)
- [Reports](#reports)
- [How to configure it](#how-to-configure-it)
- [How to run it](#how-to-run-it)
- [TODO](#todo)## Reports
Every day the circumvention monitor runs automatically and generates two files:
- [report/report.md](report/report.md) - human-readable report.
- [report/rules.txt](report/rules.txt) - blocking rules for the domains discovered by the crawler.## How to configure it
In order to add a new ad system to monitor, add a new JS object to the [configuration](conf/configuration.json).
```json
{
"name": "AD SYSTEM NAME",
"criteria": [
{
"urlPattern": "URL PATTERN",
"contentPattern": "CONTENT PATTERN",
"contentType": "script",
"thirdParty": true,
"ruleProperties": {
"modifiers": ["third-party"],
"scope": "registeredDomain"
}
}
],
"pages": ["https://example.net/", "https://example.com/"]
}
```- `name` - ad system name. Will be used in the report to identify this ruleset.
- `criteria` - a list of criteria that will be used to identify ad requests.- `urlPattern` _(optional)_ - ad request URL must match this pattern. It can be a string, a wildcard, or a regular expression.
Examples:
- `test` - string, all URLs that contain this string.
- `*test*test*` - wildcard, the URL must match this wildcard.
- `/.*test.*/` - regular expression. Note that `/` are just special characters and not a part of the regular expression.- `contentPattern` _(optional)_ - response content must match this pattern. Just like `urlPattern`, it can be a string, a wildcard, or a regular expression.
- `contentType` _(optional)_ - one of this [list](https://github.com/puppeteer/puppeteer/blob/v3.0.2/docs/api.md#requestresourcetype).
- `thirdParty` _(optional)_ - if specified, we check if request is third party or not.- `ruleProperties` _(optional)_ - additional propreties for the rules generated by the compiler.
- `modifiers` _(optional)_ - an array of modifiers that should be added to the rule
- `scope` _(optional)_ - rule scope. Possible values are:
- `domain` - full domain name (`||exact.domain.name^`)
- `registeredDomain` - registered domain name (eTLD+1) (`||domain.name^`)
- `domainAndPath` - domain + path (`||exact.domain.name/path/without/query`)- `pages` - a list of webpages that will be crawled in order to extract this ad system domains.
## How to run it
- `yarn install` - install dependencies
- `yarn monitor` - run the crawler with default argumentsRun `yarn monitor -v` to make it print the verbose log.
## TODO
- [x] Make basic rules modifiers configurable (see report.js)
- [ ] Allow monitoring DOM state (I need examples where this is needed)
- [ ] "criteria" should allow blocking or adding custom CSS to test pages so that we could trigger circumvention scripts