Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mdamien/chrome-extensions-archive
:pager: Archive all the chrome extensions (until Feb 4. 2019)
https://github.com/mdamien/chrome-extensions-archive
Last synced: 7 days ago
JSON representation
:pager: Archive all the chrome extensions (until Feb 4. 2019)
- Host: GitHub
- URL: https://github.com/mdamien/chrome-extensions-archive
- Owner: mdamien
- License: mit
- Created: 2016-02-26T20:29:52.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2019-05-08T06:14:31.000Z (almost 6 years ago)
- Last Synced: 2025-02-06T14:15:49.881Z (14 days ago)
- Language: Python
- Homepage: https://crx.dam.io
- Size: 32.6 MB
- Stars: 385
- Watchers: 39
- Forks: 71
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- jimsghstars - mdamien/chrome-extensions-archive - :pager: Archive all the chrome extensions (until Feb 4. 2019) (Python)
README
# Chrome Extensions Archive: No updates since Feb 4. 2019
**In maintenance: disk is full ! (2 To)**The goal is to provide a complete archive of the chrome web store with version
history.You can see the current status of what's archived and download the files here:
[dam.io/chrome-extensions-archive/](http://dam.io/chrome-extensions-archive/)## Installing the extensions
To install an extension, go to `chrome://extensions/` and drop the file.
To avoid the auto-update, [load it as an unpacked extension](http://stackoverflow.com/a/24577660/1075195)
Files are named as `.zip` but they are the exact same `.crx` stored on the store.
## Running the scripts
**scripts are python 3.5+ only**
Install dependencies: `pip3 install -r req.txt`
Create some folders and initialize some files:
```
mkdir data
mkdir crawled
mkdir crawled/sitemap
mkdir crawled/pages
mkdir crawled/crx
mkdir crawled/tmp
mkdir ../site
mkdir ../site/chrome-extensions-archive
mkdir ../site/chrome-extensions-archive/ext
echo "{}" > data/not_in_sitemap.json
```Crawling:
- `crawl_sitemap.py`: gets you the list of all the extensions in `data/sitemap.json`
- `crawl_crx.py`: use `data/sitemap.json` to download the crxSite & stats:
- `scan_pages_history_to_big_list.py`: makes `data/PAGES.json` by scanning the pages
you crawled
- `crx_stats.py`: makes `data/crx_stats.json` (what's currently stored)
- `make_site.py`: use `data/crx_stats.json` + `data/PAGES.json` to generate the site
- `make_json_site.py`: `data/crx_stats.json` + `data/PAGES.json` to generate JSONThen I serve the files directly with nginx (see nginx.conf file for example)
## Helping out
I have a few things in mind for the future:
- diff of extensions versions as a web interface
- malware/adware analysis
- running an alternative web store (better search, firefox support,...)Don't hesitate to reach out (here on issues, [email protected] or @dam_io on twitter)
To propose changes, just do a PR.