https://github.com/logicalhacking/extensioncrawler
A collection of utilities for downloading and analyzing browser extension from the Chrome Web store.
https://github.com/logicalhacking/extensioncrawler
chrome chrome-extension
Last synced: 5 months ago
JSON representation
A collection of utilities for downloading and analyzing browser extension from the Chrome Web store.
- Host: GitHub
- URL: https://github.com/logicalhacking/extensioncrawler
- Owner: logicalhacking
- License: gpl-3.0
- Created: 2016-09-08T22:04:47.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2023-10-10T19:46:51.000Z (about 2 years ago)
- Last Synced: 2025-04-15T04:17:48.845Z (8 months ago)
- Topics: chrome, chrome-extension
- Language: Python
- Homepage: https://git.logicalhacking.com/BrowserSecurity/ExtensionCrawler
- Size: 833 KB
- Stars: 19
- Watchers: 4
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ExtensionCrawler
A collection of utilities for downloading and analyzing browser
extension from the Chrome Web store.
* `crawler`: A crawler for extensions from the Chrome Web Store.
* `crx-tool`: A tool for analyzing and extracting `*.crx` files
(i.e., Chrome extensions). Calling `crx-tool.py .crx`
will check the integrity of the extension.
* `crx-extract`: A simple tool for extracting `*.crx` files from the
tar-based archive hierarchy.
* `crx-jsinventory`: Build a JavaScript inventory of a `*.crx` file using a
JavaScript decomposition analysis.
* `crx-jsstrings`: A tool for extracting code blocks, comment blocks, and
string literals from JavaScript.
* `create-db`: A tool for updating a remote MariaDB from already
existing extension archives.
The utilities store the extensions in the following directory
hierarchy:
```shell
archive
├── conf
│ └── forums.conf
├── data
│ └── ...
└── log
└── ...
```
The crawler downloads the most recent extension (i.e., the `*.crx`
file as well as the overview page. In addition, the `conf` directory
may contain one file, called `forums.conf` that lists the ids of
extensions for which the forums and support pages should be downloaded
as well. The `data` directory will contain the downloaded extensions.
The `crawler` and `create-db` scripts will access and update a MariaDB.
They will use the host, datebase, and credentials found in `~/.my.cnf`.
Since they make use of various JSON features, it is recommended to use at
least version 10.2.8 of MariaDB.
All utilities are written in Python 3.7. The required modules are listed
in the file `requirements.txt`.
## Installation
Clone and use pip3 to install as a package.
```shell
git clone git@logicalhacking.com:BrowserSecurity/ExtensionCrawler.git
pip3 install --user -e ExtensionCrawler
```
## Team
* [Achim D. Brucker](http://www.brucker.ch/)
* [Michael Herzberg](http://www.dcs.shef.ac.uk/cgi-bin/makeperson?M.Herzberg)
### Contributors
* Mehmet Balande
## License
This project is licensed under the GPL 3.0 (or any later version).
SPDX-License-Identifier: GPL-3.0-or-later
## Master Repository
The master git repository for this project is hosted by the [Software
Assurance & Security Research Team](https://logicalhacking.com) at
.