https://github.com/turboway/pybigdata
使用 python 操作大数据的各种组件
https://github.com/turboway/pybigdata
elasticsearch hadoop hbase hive impala kafka mapreduce spark
Last synced: 19 days ago
JSON representation
使用 python 操作大数据的各种组件
- Host: GitHub
- URL: https://github.com/turboway/pybigdata
- Owner: TurboWay
- License: mit
- Created: 2020-05-27T01:54:38.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-02-17T04:11:03.000Z (about 2 years ago)
- Last Synced: 2025-03-26T19:45:41.506Z (about 1 month ago)
- Topics: elasticsearch, hadoop, hbase, hive, impala, kafka, mapreduce, spark
- Language: Python
- Homepage:
- Size: 85 KB
- Stars: 62
- Watchers: 4
- Forks: 18
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pybigdata
做大数据应用需要学习什么编程语言,一定要学 java 吗,不,python 也是一个很好的选择
所以,一起用 python 来玩转大数据吧
# install
```shell
pip install -r requirements.txt
pip install --no-deps thrift-sasl==0.2.1
```# list
| 大数据组件 | python 操作示例 | 文档 |
|------------------|------------------------------------------------------------------------------------------------------------------------------------| --------------- |
| hadoop | [ctrl_hdfs.py](hadoop/ctrl_hdfs.py) | [hdfs](https://hdfscli.readthedocs.io/en/latest/) |
| hadoop-mapreduce | [mapreduce](hadoop/mapreduce/wordcount) | [mapreduce.md](hadoop/mapreduce/wordcount/wordcount.md) |
| hive | [ctrl_hive.py](hive/ctrl_hive.py)
[一进一出 udf](hive/hive-udf)
[多进一出 udaf](hive/hive-udaf)
[一进多出 udtf](hive/hive-udtf) | [impyla](https://github.com/cloudera/impyla) |
| impala | [ctrl_impala.py](impala/ctrl_impala.py) | [impyla](https://github.com/cloudera/impyla) |
| hbase | [ctrl_hbase.py](hbase/ctrl_hbase.py) | [happybase](https://happybase.readthedocs.io/en/latest/user.html#retrieving-data) |
| kafka | [demo_producer.py](kafka/demo_producer.py)
[demo_consumer.py](kafka/demo_consumer.py) | [kafka](https://kafka-python.readthedocs.io/en/master/) |
| elasticsearch | [ctrl_elasticsearch.py](elasticsearch/ctrl_elasticsearch.py) | [elasticsearch](https://elasticsearch-py.readthedocs.io/en/7.7.1/) |
| spark | [demo_spark.py](spark/demo_spark.py) | [pyspark](http://spark.apache.org/docs/latest/api/python/getting_started/index.html) |
| flink | [flink-sql](flink/flink-sql) | [flink 实践系列2-flinksql](http://blog.turboway.top/article/flinksql/) |
| doris | [ctrl_doris](doris/ctrl_doris.py) | [DorisClient](https://github.com/TurboWay/DorisClient) |