https://github.com/onetail/textanalytics
tku homework
https://github.com/onetail/textanalytics
kafka python3 text-analytics
Last synced: 11 months ago
JSON representation
tku homework
- Host: GitHub
- URL: https://github.com/onetail/textanalytics
- Owner: Onetail
- Created: 2018-04-11T17:30:58.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-04-08T12:04:54.000Z (about 8 years ago)
- Last Synced: 2025-07-04T19:43:43.024Z (12 months ago)
- Topics: kafka, python3, text-analytics
- Language: Python
- Size: 2.84 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
README
# Text Analytics System
## Basic Environment
* System: Ubuntu 16.04.4 LTS
* Python: 3.5.2
* Kafka: 2.11-0.11.0.2
* Internet: Alibaba Cloud (on Great Firewall)
## Flow
* Install Kafka refer to [How to setup Kafka on Ubuntu 16.04](https://hevodata.com/blog/how-to-set-up-kafka-on-ubuntu-16-04/)
* Create topics in kafka/bin
```
sudo ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topicName
```
* Write the python web crawlers with [pykafka](https://github.com/Parsely/pykafka)
* Build [TF-IDF](https://www.jianshu.com/p/edf666d3995f) Model for each article
* pipe to the python text analytics
## Usage
* get the latest news from politics/entertainment
```
python3 crawler.py politics
python3 crawler.py entertainment
```
* get Message from kafka
```
python3 mqtool.py
```
* Analytics Message
```
python3 nlp.py politics
python3 nlp.py entertainment
```
* Auto crontab
```
$crontab -e
```