https://github.com/gregors/seekr
https://github.com/gregors/seekr
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/gregors/seekr
- Owner: gregors
- License: mit
- Created: 2022-09-10T01:54:19.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-09-10T15:33:11.000Z (over 3 years ago)
- Last Synced: 2025-02-08T13:44:00.108Z (11 months ago)
- Size: 140 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# seekr
A tool for finding and pages that contain words or phrases in the DOM.
## Overview
seekr is a tool for finding and pages that contain words or phrases in the DOM.
It is a command line tool that takes a list of words or phrases in a file, and
traverses pages in search of those terms in the DOM. There is a feature that
will expand each word in the list to its additions, subtractions, substitutions,
and transpositions, and search for those as well. The tool will output a list of
pages that contain the terms.
seekr is currently wired to the Internet Computer as its source of pages to
search. It can be easily modified to search other sources.
## Requirements
seekr requires:
- [node.js](https://nodejs.org/)
- [npm](https://www.npmjs.com/)
- [typescript](https://www.typescriptlang.org/)
- [yarn](https://yarnpkg.com/)
## Installation
From the root of the project, run:
```bash
yarn
```
## Usage
Create a file with a list of words or phrases to search for. For example, `dictionary.txt`:
```text
cabbage
lettuce
```
Create a file with a list of domains that are crawlable. For example, `interesting_dmains.txt`:
```text
google.com
wikipedia.org
```
Links in the `interesting_domains.txt` file should be in the format `domain.com` or `subdomain.domain.com`.
Any links found in the crawl that are in the `interesting_domains.txt` file will be searched as well.
```bash
npm run cli seek
```