{"id":22372496,"url":"https://github.com/itielshwartz/python-station-backend","last_synced_at":"2025-03-26T17:19:42.133Z","repository":{"id":94506011,"uuid":"96611008","full_name":"itielshwartz/python-station-backend","owner":"itielshwartz","description":"A full pipeline for downloading, cleaning and enriching the history of planetpython.org","archived":false,"fork":false,"pushed_at":"2017-09-05T18:49:59.000Z","size":5,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-31T22:11:49.577Z","etag":null,"topics":["backend","beautifulsoup","pipeline","praw","python","python-station"],"latest_commit_sha":null,"homepage":"http://python-station.etlsh.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/itielshwartz.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-08T10:05:51.000Z","updated_at":"2023-02-20T15:19:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"cde85424-9080-4fca-8e9a-f205472eaffb","html_url":"https://github.com/itielshwartz/python-station-backend","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itielshwartz%2Fpython-station-backend","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itielshwartz%2Fpython-station-backend/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itielshwartz%2Fpython-station-backend/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itielshwartz%2Fpython-station-backend/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/itielshwartz","download_url":"https://codeload.github.com/itielshwartz/python-station-backend/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245699263,"owners_count":20657987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backend","beautifulsoup","pipeline","praw","python","python-station"],"created_at":"2024-12-04T20:43:12.149Z","updated_at":"2025-03-26T17:19:42.117Z","avatar_url":"https://github.com/itielshwartz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python station backend\n# About\n* The backend behind : [python-station]\n\n* Full data pipeline to scrape \u003chttp://planetpython.org\u003e \n\n* Output: Every Github (Python) project featured on the history of planetpython.\n \n* Also includes data enrichment using Github + Reddit + Hackernews APi.\n\n## How does it work?\n1. Download the pages from planetPython.org clone\n\n2. Use [BeautifulSoup] to transform raw page into posts\n\n2. Use [Github API] to get basic project data (And filter no python projects)\n \n 4. Use [Praw] (Reddit) + [HN Api] + [Github Trending] to enrich data\n \n 5. Show data using [Github pages + Vue.js]\n \n\n\n\n# How to run?\n  - Clone the project\n  - `python3 -m venv ./venv \u0026\u0026 source venv/bin/activate \u0026\u0026 pip install -r requirements.txt`\n  - `venv/bin/python pipeline.py --pages-to-download 5`\n  - To download Reddit data you need to fill in your reddit creds in: `requests_utils.py`\n  - If you get limit on your Github requests you need to fill in your Github creds in: `requests_utils.py`\n  \n# Pipeline Flow chart\n```\n+-------------------+\n| Download Pages    |\n+---------+---------+\n          |\n+---------v---------+\n|Transform to Posts |\n+---------+---------+\n          |\n+---------v---------+\n|Extract projects   |\n+---------+---------+\n          |\n+---------v---------+\n|Enrich Using Apis  |\n+---------+---------+\n          |\n+---------v----------+\n|Deploy Using Github |\n| Pages              |\n+--------------------+\n```\n\n\n### Development\n\nWant to contribute? Great!\nFeel free to open PR/Issue :)\n\nLicense\n----\n\nMIT - **Free Software, Hell Yeah!**\n\n[//]: #URLs\n\n   [python-station]: \u003chttp://python-station.etlsh.com/\u003e\n   [nginx]: \u003chttps://www.nginx.com/resources/wiki/\u003e\n   [BeautifulSoup]: \u003chttps://www.crummy.com/software/BeautifulSoup/bs4/doc/\u003e\n   [Github API]: \u003chttps://developer.github.com/v3/\u003e\n   [Praw]: \u003chttps://github.com/praw-dev/praw\u003e\n   [HN Api]: \u003chttps://github.com/HackerNews/API\u003e\n   [Github Trending]: \u003chttps://github.com/trending/python?since=daily\u003e\n   [Github pages + Vue.js]: https://github.com/itielshwartz/python-station-website\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitielshwartz%2Fpython-station-backend","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fitielshwartz%2Fpython-station-backend","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitielshwartz%2Fpython-station-backend/lists"}