https://github.com/mtnmunuklu/lescatit
Provides to crawl and categorize URL addresses
https://github.com/mtnmunuklu/lescatit
docker fiber go golang grpc kubernetes mongo nlp protocol-buffers rest-api traefik
Last synced: 3 months ago
JSON representation
Provides to crawl and categorize URL addresses
- Host: GitHub
- URL: https://github.com/mtnmunuklu/lescatit
- Owner: mtnmunuklu
- License: mit
- Created: 2021-06-14T15:08:27.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2024-05-08T06:49:06.000Z (almost 2 years ago)
- Last Synced: 2024-05-08T07:43:07.589Z (almost 2 years ago)
- Topics: docker, fiber, go, golang, grpc, kubernetes, mongo, nlp, protocol-buffers, rest-api, traefik
- Language: Go
- Homepage:
- Size: 263 MB
- Stars: 10
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# Lescatit (Let's categorized it)
Lescatit is a project developed in **Go**, **Mongo**, **Docker**, **Kubernetes**, **gRPC** and **Fiber** technologies, providing URL crawling and categorization functionality.
## Table of Contents
* [Features](#features)
* [Setup](#setup)
* [Usage](#usage)
* [License](#license)
## Features
Lescatit offers the following features:
* Getting user(s) information
* Deleting a user
* Changing user roles
* Updating user passwords
* Updating user email addresses
* Updating usernames
* Getting content of URL(s)
* Crawling URL(s)
* Categorizing URL(s)
* Generating a classification model
* Getting the classification model
* Updating the classification model
* Deleting classification model(s)
* Listing all classification models
* Getting URL categories
* Updating URL categories
* Reporting miscategorization
* Adding URL addresses
* Deleting URL(s)
* Listing all URLs
## Setup
To set up Lescatit, follow these steps:
1. Download the latest version:
```
LATEST_VERSION=$(wget -qO - https://api.github.com/repos/mtnmunuklu/lescatit/releases/latest \
| grep tag_name \
| cut -d '"' -f 4)
curl -LJO https://github.com/mtnmunuklu/lescatit/archive/refs/tags/$LATEST_VERSION.tar.gz
```
2. Extract the downloaded file:
```
FILE_NAME=lescatit-$(echo $LATEST_VERSION | cut -d 'v' -f 2)
tar -xvf $FILE_NAME.tar.gz
```
3. Execute the setup scripts:
```
cd $FILE_NAME/scripts
# Execute on worker and control plane servers.
bash tools/setup_tools.sh
bash k8s/setup_k8s.sh
# Execute only on the first control plane server.
# It will create setup_k8s_control_plane.sh and setup_k8s_worker.sh files.
# Control plane and worker scripts are for joining the Kubernetes cluster.
# You can use these scripts on new nodes when you add new nodes as control plane or worker.
bash k8s/setup_k8s_first_control_plane.sh
# Execute only on first control plane server.
bash setup_lescatit.sh
```
## Usage
Lescatit consists of 6 different services: [authentication](authentication), [crawler](crawler), [categorizer](categorizer), [categorization](categorization), [api](api) and [web](web). Unlike the other services, incoming requests to the web service are directly routed without passing through the API service. The requested URL plays a role in the decision-making process.
To understand the features of each service, the available endpoints, how to make requests, and the expected responses, refer to the [api.pdf](docs/api/api.pdf) file under the [docs](docs) folder.
You can also access the documents describing the `software structure` of each service under the [docs](docs) folder.
## License
Alterix is licensed under the MIT License. See [LICENSE](LICENSE) for the full text of the license.