Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ronniery/crawler.synom
A crawler for the sinonimo.com.br website that saves the words into mongodb database.
https://github.com/ronniery/crawler.synom
bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml
Last synced: 6 days ago
JSON representation
A crawler for the sinonimo.com.br website that saves the words into mongodb database.
- Host: GitHub
- URL: https://github.com/ronniery/crawler.synom
- Owner: ronniery
- License: apache-2.0
- Created: 2018-10-24T17:37:42.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-12-10T04:04:37.000Z (18 days ago)
- Last Synced: 2024-12-10T05:18:21.319Z (18 days ago)
- Topics: bot, crawler, html, html5, javascript, mongodb, nodejs, nosql, npm, scraper, thesaurus, typescript, web, website, xml
- Language: TypeScript
- Homepage:
- Size: 12.8 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Crawler.synom
Only Pt-br words## About The Project
I created the project when my leader needed a bunch of synonym words (on PT-BR) to use it inside our MSSQL database, to enable some text markups to our users, so i handle that problem with that web site www.sinonimo.com.br that contains a lot of synonyms, with that project you will collect all data from the words and their synonym, after that you can generate a thesaurus.xml to import that (if you're on Microsoft ecosystem).
## Getting Started
You will need follow the steps below to run that application.
### Prerequisites
To correct run the project make sure that you have the dependencies installed on your machine.
* npm
* mongodbYou **need** that package to make the crawler run managed and restarted if needed.
```sh
npm/yarn install pm2 -g
```### Installation
1. Clone the repo
```sh
git clone https://github.com/ronniery/crawler.synom
```
2. Go inside project folder
```sh
cd crawler.synom
```
3. Now open the file `.env` on the root of the project and set the variables `DB_HOST`, `DB_USER` and `DB_PASSWORD`.
4. Install NPM packages
```sh
npm install
```
Or
```sh
yarn install
```
5. Just run on bash
```sh
pm2 start ecosystem.config.js
```PM2 package will handle the crawler execution for you.
## Flags
There is 2 command line arguments that you can start the application with it:
**--run-crawler** or **--run-crawler=true**: With that flag you're starting the application to run the crawler only.
**--run-xml-builder** or **--run-xml-builder**: With that flag you will start the application to generate Thesaurus xml file.## License
Distributed under the MIT License