https://github.com/extractus/extractus
https://github.com/extractus/extractus
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/extractus/extractus
- Owner: extractus
- License: mit
- Created: 2023-06-01T08:19:18.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-07-25T11:57:12.000Z (almost 2 years ago)
- Last Synced: 2025-06-07T04:35:01.163Z (about 1 year ago)
- Language: HTML
- Size: 1.01 MB
- Stars: 14
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Installation
```shell
pnpm i @extractus/extractus
```
## Usage
#### Extract html with [default extractors, transformer, selector](packages/defaults)
```typescript
import { extract } from '@extractus/extractus'
extract(htmlString, options)
```
## Reference
### Extractor
Extract all strings from the html
Example: [packages/defaults/extractors.ts](packages/defaults/extractors.ts)
```typescript
type Extractor =
| ((input: string, context?: ExtractContext) => string | undefined)
| ((input: string) => string | undefined)
```
### Transformer
Transform the extracted strings. Such as normalize urls, filter blank strings
Example: [packages/defaults/transformer.ts](packages/defaults/transformer.ts)
```typescript
type Transformer =
| ((input: Iterable, context?: ExtractContext) => Iterable)
| ((input: Iterable) => Iterable)
```
### Selector
Select one value from transformed values. Such as the first title, string to date object
Example: [packages/defaults/selector.ts](packages/defaults/selector.ts)
```typescript
type Selector =
| ((input: Iterable, context?: ExtractContext) => T)
| ((input: Iterable) => T)
```
## Development
Using [pnpm](https://pnpm.io) for manage workspace
- Clone repo
- Open project in terminal or IDE
- Run `pnpm i` at the root of project
## Roadmap
https://github.com/orgs/extractus/projects/2/views/1