https://github.com/paceaux/selector-finder
Find a CSS selector on a public site
https://github.com/paceaux/selector-finder
crawler css javascript nodejs screenshots selector-finder
Last synced: 5 months ago
JSON representation
Find a CSS selector on a public site
- Host: GitHub
- URL: https://github.com/paceaux/selector-finder
- Owner: paceaux
- License: mit
- Created: 2021-01-07T13:13:46.000Z (over 5 years ago)
- Default Branch: develop
- Last Pushed: 2025-10-20T14:40:09.000Z (8 months ago)
- Last Synced: 2025-10-20T16:32:47.525Z (8 months ago)
- Topics: crawler, css, javascript, nodejs, screenshots, selector-finder
- Language: JavaScript
- Homepage:
- Size: 879 KB
- Stars: 25
- Watchers: 1
- Forks: 2
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE.md
Awesome Lists containing this project
README
# Selector Hound: sniff out CSS Selectors on a site
`SelectorHound` lets you find CSS selectors on a public or local site. Rename, refactor, and delete unused CSS with a bloodhound on your side.
A sitemap or a site URL is enough to get started. Provide a single CSS selector, a comma separated string of selectors, or even a stylesheet.
Pages with _zero_ matches aren't put in the results. Pages with at least _one_ match are in the result, and you find out which CSS selectors _aren't_ used. (That's right, it's a selector _finder_ **and** ... "not founder").
Optionally take a screenshot of the elements (though it may hurt performance).
Pages not showing up that should? Check the `log.txt` for any issues.
## Installation
### Prerequisites
- Node LTS (as of August 2024, Node 20.16.0)
#### Some possible Puppeteer setup for Mac Users
If you want to use the `-d` or `-c` (`--isSpa` and `--takeScreenshots` ) options, this requires Puppeteer which in turn requires Chromium.
You may (or may not) need `PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true` and `PUPPETEER_EXECUTABLE_PATH` environment variables set. They were necessary for older versions of Puppeteer and seem to be unnecessary for new ones.
If you're having issues, run `printenv` in your terminal to see if those variables are set. If they are, you may need to unset them with `unset PUPPETEER_SKIP_CHROMIUM_DOWNLOAD` and `unset PUPPETEER_EXECUTABLE_PATH`. Then `source ~/.bashrc` or `source ~/.zshrc` to be sure, and run `printenv` once again.
But if they aren't set, you may need to do this.
After you play with those variables, reinstall this package.
### Running on-demand
Download this package. Then run
```shell
npm install
```
### Globally via NPM
```shell
npm i -g selector-hound
```
## Usage
### Basic Scanning
Only scan the first 20 pages for `.yourthing`
```shell
SelectorHound --sitemap=https://wherever.com/xml --limit=20 --selector=".yourthing"
SelectorHound -u https://wherever.com/xml -l 20 -s ".yourthing"
```
### Re-using, regenerating, and providing a list of links
Before the site scanning begins, this generates a `.sitemap.json` file containing all of the links it will scan. This file is generated from the `sitemap.xml` file you provided **or** from crawling the site looking for links. To improve performance, SelectorHound will look for this file _first_ before attempting to retrieve/generate a sitemap.
If you want to re-generate this `.sitemap.json` file, you can force it:
```shell
SelectorHound --sitemap=https://wherever.com/xml --selector=".yourthing" --dontUseExportedSitemap
SelectorHound -u https://mysite.com/landing -r -s '.yourThing' -X
```
#### Formatting
By default, SelectorHound will generate a format that's based off of how sitemap XML looks, which is an array of objects with a `loc` property:
```JavaScript
[
{
'loc': 'https://mysite.com/path'
},
{
'loc': 'https://mysite.com/another'
}
]
```
However, you can also provide your own list of links as just an array of strings:
```JavaScript
[
"https://mysite.com/path",
"https://mysite.com/another"
]
```
### Crawling instead of using a sitemap
Crawl the site, starting from a landing page.
```shell
SelectorHound -u https://mysite.com/landing -r -s ".myClass"
```
### Taking Screenshots or dealing with SPAs
Scan the first 20 pages and take screenshots
```shell
SelectorHound -u https://wherever.com/xml -l 20 -s ".yourthing" -c
```
Scan those pages, but treat them like Single Page Applications (`-d`), and search for all the selectors in `mystyles.css`
```shell
SelectorHound -u https://wherever.com/xml -f "mystyles.css" -d
```
### Options
| Option | Alias | Description | Defaults |
|---|---|---|---|
| `--sitemap` |`-u` | Must be fully qualified URL to an XML Sitemap **or** fully qualified URL to a page **if** `crawl` is `true`. Required. | `https://frankmtaylor.com/sitemap.xml` |
| `--dontUseExportedSitemap` |`-X` | if a `.sitemap.json` file has been already been created, ignore it and generate a new one. Optional. | `false` |
| `--limit` | `-l` | Maximum number of pages to crawl. Optional. | `0` |
| `--selector` | `-s` | A valid CSS selector. Required. | `.title` |
| `--cssFile` | `-f` | A CSS file to use instead of a single selector. Optional. | |
| `--isSpa`| `-d` | Uses Puppeteer instead of Cheerio (in case some content is dynamic). Optional. | `false`|
| `--takeScreenshots`| `-c` | Takes screenshots with Puppeteer. Optional. | `false` |
| `--outputFileName` | `-o` | A provided value will be prepended to `pages.json` and will be output in your current directory. Оptional. | `pages.json` |
| `--showElementDetails` | `-e` | Show details for elements that match result (`tag`, `innerText`, `attributes`) . Optional. | `false` |
| `--showHtml` | `-m` | Shows HTML of the elements that match the result. Optional. | `true` |
| `--crawl`| `-r` | Crawls the site instead of using a sitemap. Outputs a file called `.sitemap.json`. Optional. | `false` |