https://github.com/macbre/3pc
3rd party web content database
https://github.com/macbre/3pc
Last synced: 5 months ago
JSON representation
3rd party web content database
- Host: GitHub
- URL: https://github.com/macbre/3pc
- Owner: macbre
- License: bsd-2-clause
- Created: 2014-03-07T21:33:38.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2018-09-03T13:56:00.000Z (almost 8 years ago)
- Last Synced: 2025-09-20T02:24:38.790Z (9 months ago)
- Language: Python
- Homepage: https://www.npmjs.com/package/3pc
- Size: 65.4 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
3pc
===
Third party web content database
## What is it?
`3pc` is meant to be a data-provider of Content Delivery Network, tracking services, social media widgets and 3rd party JS libraries URLs list.
Inspired by [this PerfPlanet article](http://calendar.perfplanet.com/2013/thirdpartycontent/).
## Usage
`3pc` is built as nodejs module:
```
npm install 3pc
```
It provides "raw" data and some helper functions:
```js
var thirdParty = require('3pc');
// check if given URL is provided via Content Delivery Network
console.log(thirdParty.cdn.matchByUrl('http://example.com/foo.js'));
false
console.log(thirdParty.cdn.matchByUrl('http://vignette3.wikia.nocookie.net/nordycka/images/e/ee/Tj%C3%B8rnuv%C3%ADk.jpg/revision/latest/scale-to-width-down/640?cb=20150904165805&path-prefix=pl'));
"Fastly"
// check if given URL is not a tracking code
console.log(thirdParty.trackers.matchByUrl('http://edge.quantserve.com/quant.js'));
"Quantcast"
```
## Data sources
`3pc` is currently using the following data sources:
* [CDN providers from webpagetest](https://raw.githubusercontent.com/WPO-Foundation/webpagetest/master/agent/wpthook/cdn.h) (125 entries)
* [Tracking services from Ghostery](https://raw.githubusercontent.com/jonpierce/ghostery/master/firefox/ghostery-statusbar/ghostery/chrome/content/ghostery-bugs.js) (219 entries)
These sources are parsed and the result is stored in `./db` directory as JSON files by running a Python script:
```
make generate
```
So this database can be used by any technology that can read and parse JSON-encoded files.