{"id":20450021,"url":"https://github.com/turboway/pybigdata","last_synced_at":"2025-07-23T08:05:35.920Z","repository":{"id":37086746,"uuid":"267195490","full_name":"TurboWay/pybigdata","owner":"TurboWay","description":"使用 python 操作大数据的各种组件","archived":false,"fork":false,"pushed_at":"2023-02-17T04:11:03.000Z","size":87,"stargazers_count":63,"open_issues_count":3,"forks_count":18,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-13T02:17:18.245Z","etag":null,"topics":["elasticsearch","hadoop","hbase","hive","impala","kafka","mapreduce","spark"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TurboWay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-27T01:54:38.000Z","updated_at":"2025-04-07T13:38:57.000Z","dependencies_parsed_at":"2024-11-15T10:49:48.540Z","dependency_job_id":"dae9cff9-dd11-4f7a-a35b-30553a277c24","html_url":"https://github.com/TurboWay/pybigdata","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TurboWay/pybigdata","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurboWay%2Fpybigdata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurboWay%2Fpybigdata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurboWay%2Fpybigdata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurboWay%2Fpybigdata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TurboWay","download_url":"https://codeload.github.com/TurboWay/pybigdata/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurboWay%2Fpybigdata/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266640820,"owners_count":23960808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elasticsearch","hadoop","hbase","hive","impala","kafka","mapreduce","spark"],"created_at":"2024-11-15T10:49:40.706Z","updated_at":"2025-07-23T08:05:35.891Z","avatar_url":"https://github.com/TurboWay.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pybigdata\n![](https://img.shields.io/badge/python-3.6%2B-brightgreen)\n\n做大数据应用需要学习什么编程语言，一定要学 java 吗，不，python 也是一个很好的选择\n\n所以，一起用 python 来玩转大数据吧\n\n\n\n# install\n\n```shell\npip install -r requirements.txt\npip install --no-deps thrift-sasl==0.2.1\n```\n\n\n\n# list\n\n| 大数据组件            | python 操作示例                                                                                                                        | 文档 |\n|------------------|------------------------------------------------------------------------------------------------------------------------------------| --------------- |\n| hadoop           | [ctrl_hdfs.py](hadoop/ctrl_hdfs.py)                                                                                                | [hdfs](https://hdfscli.readthedocs.io/en/latest/) |\n| hadoop-mapreduce | [mapreduce](hadoop/mapreduce/wordcount)                                                                                            | [mapreduce.md](hadoop/mapreduce/wordcount/wordcount.md)    |\n| hive             | [ctrl_hive.py](hive/ctrl_hive.py) \u003cbr\u003e [一进一出 udf](hive/hive-udf) \u003cbr\u003e [多进一出 udaf](hive/hive-udaf) \u003cbr\u003e [一进多出 udtf](hive/hive-udtf) | [impyla](https://github.com/cloudera/impyla)                |\n| impala           | [ctrl_impala.py](impala/ctrl_impala.py)                                                                                            | [impyla](https://github.com/cloudera/impyla) |\n| hbase            | [ctrl_hbase.py](hbase/ctrl_hbase.py)                                                                                               | [happybase](https://happybase.readthedocs.io/en/latest/user.html#retrieving-data) |\n| kafka            | [demo_producer.py](kafka/demo_producer.py) \u003cbr\u003e [demo_consumer.py](kafka/demo_consumer.py)                                         | [kafka](https://kafka-python.readthedocs.io/en/master/) |\n| elasticsearch    | [ctrl_elasticsearch.py](elasticsearch/ctrl_elasticsearch.py)                                                                       | [elasticsearch](https://elasticsearch-py.readthedocs.io/en/7.7.1/) |\n| spark            | [demo_spark.py](spark/demo_spark.py)                                                                                               | [pyspark](http://spark.apache.org/docs/latest/api/python/getting_started/index.html)                |\n| flink            | [flink-sql](flink/flink-sql)                                                                                                       |  [flink 实践系列2-flinksql](http://blog.turboway.top/article/flinksql/)                |\n| doris            | [ctrl_doris](doris/ctrl_doris.py)                                                                                                  |  [DorisClient](https://github.com/TurboWay/DorisClient)                |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fturboway%2Fpybigdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fturboway%2Fpybigdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fturboway%2Fpybigdata/lists"}