Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ronniery/crawler.synom

A crawler for the sinonimo.com.br website that saves the words into mongodb database.
https://github.com/ronniery/crawler.synom

bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml

Last synced: 6 days ago
JSON representation

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

Awesome Lists containing this project

README

        
















Crawler.synom

Only Pt-br words

## About The Project

I created the project when my leader needed a bunch of synonym words (on PT-BR) to use it inside our MSSQL database, to enable some text markups to our users, so i handle that problem with that web site www.sinonimo.com.br that contains a lot of synonyms, with that project you will collect all data from the words and their synonym, after that you can generate a thesaurus.xml to import that (if you're on Microsoft ecosystem).

## Getting Started

You will need follow the steps below to run that application.

### Prerequisites

To correct run the project make sure that you have the dependencies installed on your machine.
* npm
* mongodb

You **need** that package to make the crawler run managed and restarted if needed.
```sh
npm/yarn install pm2 -g
```

### Installation

1. Clone the repo
```sh
git clone https://github.com/ronniery/crawler.synom
```
2. Go inside project folder
```sh
cd crawler.synom
```
3. Now open the file `.env` on the root of the project and set the variables `DB_HOST`, `DB_USER` and `DB_PASSWORD`.
4. Install NPM packages
```sh
npm install
```
Or
```sh
yarn install
```
5. Just run on bash
```sh
pm2 start ecosystem.config.js
```

PM2 package will handle the crawler execution for you.

## Flags

There is 2 command line arguments that you can start the application with it:

**--run-crawler** or **--run-crawler=true**: With that flag you're starting the application to run the crawler only.
**--run-xml-builder** or **--run-xml-builder**: With that flag you will start the application to generate Thesaurus xml file.

## License

Distributed under the MIT License