Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/isspek/news_categorizer_api
Api for categorising news given url or given content of the news
https://github.com/isspek/news_categorizer_api
application bert dataminig machinelearning newsapi
Last synced: 1 day ago
JSON representation
Api for categorising news given url or given content of the news
- Host: GitHub
- URL: https://github.com/isspek/news_categorizer_api
- Owner: isspek
- License: mit
- Created: 2020-04-19T12:13:11.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-06-10T15:47:07.000Z (over 3 years ago)
- Last Synced: 2023-08-20T09:21:22.111Z (about 1 year ago)
- Topics: application, bert, dataminig, machinelearning, newsapi
- Language: Python
- Size: 16.6 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# News Categorizer
News categorizer is docker-based web service:
- provides category of given news url
- provides category of given news body### Installation
Edit docker-compose.yml based on your server. Then run the following command:
```sh
docker-compose up
```### Dataset
The dataset we use for the predictive model is [BBC News](https://www.kaggle.com/c/learn-ai-bbc/data). We split ``BBC News Train.csv`` into %20 of the data as validation, %10 of the data test, and the rest as train set by using random seed 42.### Technologies
It uses [BERT](https://arxiv.org/abs/1810.04805) to predict the category given content. BERT is fine tuned by using [ktrain](https://github.com/amaiya/ktrain) library in [Colab](https://colab.research.google.com/drive/1NjjO7oGoKtXuPKSsFgLRW_1z_fd5r4mb). You may use same scripts on the Colab to train your model. Make sure that you replace the files names as `model` and `model.preproc` under the directory `model` in the source code.For url based detection, it uses rule-based approach.
License
----MIT