https://github.com/tiagodanin/scraperscript
ScraperScript is a query language for Web Scraping
https://github.com/tiagodanin/scraperscript
language query scraper scraping scrapper-script
Last synced: about 1 month ago
JSON representation
ScraperScript is a query language for Web Scraping
- Host: GitHub
- URL: https://github.com/tiagodanin/scraperscript
- Owner: TiagoDanin
- License: mit
- Created: 2018-10-17T13:54:52.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2025-03-28T16:57:47.000Z (about 2 months ago)
- Last Synced: 2025-04-14T21:53:15.015Z (about 1 month ago)
- Topics: language, query, scraper, scraping, scrapper-script
- Language: JavaScript
- Homepage: https://tiagodanin.github.io/ScraperScript/
- Size: 255 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# ScraperScript
[](https://travis-ci.org/TiagoDanin/ScraperScript) [](https://npmjs.org/package/scraperscript) [](https://npmjs.org/package/scraperscript) [](https://npmjs.org/package/scraperscript) [](https://github.com/xojs/xo)
ScraperScript is a query language for Web Scraping
## Installation
Module available through the [npm registry](https://www.npmjs.com/). It can be installed using the [`npm`](https://docs.npmjs.com/getting-started/installing-npm-packages-locally) or [`yarn`](https://yarnpkg.com/en/) command line tools.
```sh
# NPM
npm install scraperscript --global
# Or Using Yarn
yarn global add scraperscript
```## Documentation
Use the command `scraperscript myfile` or server
Example file.
```markdown
@https://helloword.site/list
!! A comment ...
- names: html >> body >> div >> h2 @> {number, text, bold} :array
- hasTitle: html >> head >> title == " my string " :boolean
- title: html >> head >> title :string
```This return an json:
```json
"error": false,
"errorsMsg": [],
"names": [
{
"number": 0,
"text": "Tiago"
},
{
"number": 0,
"text": "James"
}
],
"hasTitle": true,
"title": "my string"
```## Syntax
Place the URL in the first line: `@http://myurl.com`Other lines: `- key: query :type`
PS: Space is important.
### Key
NameRules:
- Use at the beginning of the line
- Format `- key:`Example: `- name:`
### Type
Return typeRules:
- Use at the end of the line
- Format `:type`Types:
- array
- object
- boolean
- string
- numberExample: `:string`
### Query
**String**
`" my string "`
NOTE: `"my string"` is invalid
**Comment**
`!! my comment in ScrapperScript`
**Elements**
`nameOfHtmlElementOne >> nameOfHtmlElementTwo`
**Map elements [String]**
`nameOfHtmlElementOne @> nameOfSubHtmlElement`
**Map elements [Array]**
`nameOfHtmlElementOne @> [nameOfSubHtmlElement]`
**Map elements [Object]**
`nameOfHtmlElementOne @> {nameOfIndex, nameOfData, nameOfSubHtmlElement}`
**Addition**
`nameOfHtmlElementOne ++ nameOfHtmlElementTwo`
**Replace**
`nameOfHtmlElementOne -- nameOfHtmlElementTwo`
**Equal comparison or Different**
`nameOfHtmlElementOne == nameOfHtmlElementTwo`
`nameOfHtmlElementOne ~= nameOfHtmlElementTwo`
**OR**
`nameOfHtmlElementOne || nameOfHtmlElementTwo`
## Tests
To run the test suite, first install the dependencies, then run `test`:
```sh
# NPM
npm test
# Or Using Yarn
yarn test
```## Dependencies
- [axios](https://ghub.io/axios): Promise based HTTP client for the browser and node.js
- [cheerio](https://ghub.io/cheerio): Tiny, fast, and elegant implementation of core jQuery designed specifically for the server## Dev Dependencies
- [body-parser](https://ghub.io/body-parser): Node.js body parsing middleware
- [express](https://ghub.io/express): Fast, unopinionated, minimalist web framework
- [mocha](https://ghub.io/mocha): simple, flexible, fun test framework
- [xo](https://ghub.io/xo): JavaScript happiness style linter ❤️## Contributors
Pull requests and stars are always welcome. For bugs and feature requests, please [create an issue](https://github.com/TiagoDanin/ScraperScript/issues). [List of all contributors](https://github.com/TiagoDanin/ScraperScript/graphs/contributors).
## License
[MIT](LICENSE) © [Tiago Danin](https://TiagoDanin.github.io)