Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/offthetab/vkapi-ml-dataharvester
Pipeline to harvest data via VK API for ML analysis with hadoop and spark
https://github.com/offthetab/vkapi-ml-dataharvester
hadoop hdfs hive linux mariadb python requests spark sqoop
Last synced: 9 days ago
JSON representation
Pipeline to harvest data via VK API for ML analysis with hadoop and spark
- Host: GitHub
- URL: https://github.com/offthetab/vkapi-ml-dataharvester
- Owner: offthetab
- Created: 2024-07-24T23:33:47.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-01T09:33:10.000Z (4 months ago)
- Last Synced: 2024-10-11T02:41:22.105Z (about 1 month ago)
- Topics: hadoop, hdfs, hive, linux, mariadb, python, requests, spark, sqoop
- Language: Jupyter Notebook
- Homepage:
- Size: 6.69 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# VKAPI-ML-DataHarvester
Проект автоматического сбора данных о постах в группе, пользователях группы и их постах. Конвейер обработки данных включает в себя Python-скрипт для сбора данных через VK API, HDFS, MariaDB, Sqoop и Spark.