{"id":17124335,"url":"https://github.com/wittline/sparksql-with-python","last_synced_at":"2026-04-29T11:02:22.865Z","repository":{"id":111710473,"uuid":"291857557","full_name":"Wittline/SparkSQL-with-Python","owner":"Wittline","description":"This repository has some examples of using Spark and SparkSQL with Python through PySpark","archived":false,"fork":false,"pushed_at":"2020-11-25T05:12:47.000Z","size":4135,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-29T09:39:24.277Z","etag":null,"topics":["flask-api","python","spark","sparksql"],"latest_commit_sha":null,"homepage":"https://wittline.github.io/SparkSQL-with-Python/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Wittline.png","metadata":{"files":{"readme":"docs/Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-01T00:44:08.000Z","updated_at":"2023-04-23T16:06:15.000Z","dependencies_parsed_at":"2023-04-17T21:16:38.243Z","dependency_job_id":null,"html_url":"https://github.com/Wittline/SparkSQL-with-Python","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wittline%2FSparkSQL-with-Python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wittline%2FSparkSQL-with-Python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wittline%2FSparkSQL-with-Python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wittline%2FSparkSQL-with-Python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Wittline","download_url":"https://codeload.github.com/Wittline/SparkSQL-with-Python/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245203069,"owners_count":20577095,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flask-api","python","spark","sparksql"],"created_at":"2024-10-14T18:42:24.171Z","updated_at":"2026-04-29T11:02:17.812Z","avatar_url":"https://github.com/Wittline.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SparkSQL with Python\n\n\nThis repository has some examples of using Spark and SparkSQL with Python through PySpark\n\n## Profeco\n\nWe will work with the Profeco dataset, which you can download here: [Profeco](https://drive.google.com/uc?export=download\u0026id=0B-4W2dww7ELNazFfOFVhNG5vckE) , is a daily historical record of more than 2,000 products, as of 2015, in various establishments in Mexico\n\n\u003ca href=\"https://wittline.github.io/SparkSQL-with-Python/Profeco.html\"\u003eCheck the code here\u003c/a\u003e\n\n* How many records are there?\n* How many categories are there?\n* How many trade chains are being monitored (and therefore reported in that database)?\n* What are the most monitored products in each state of the country?\n* What is the trade chain with the greatest variety of monitored products?\n\n\n## Countries airports\n\n\n\u003ca href=\"https://wittline.github.io/SparkSQL-with-Python/Airports.html\"\u003eCheck the code here\u003c/a\u003e\n\n## API to count the number of tweets in a radius of 1km\n\nI will separate in another file \"tweets_geo.csv\" all the different tweets with their geographic data information, this will help in the manipulation of this data in a query with sparkSQL\n\n\u003ca href=\"https://wittline.github.io/SparkSQL-with-Python/Tweet_Count.html\"\u003eCheck the data preparation code here\u003c/a\u003e\n\n\nThe details of the code for the API REST is in the folder API in this repository\n\n\n![alt text](https://wittline.github.io/SparkSQL-with-Python/images/api1.PNG)\n\n\n![alt text](https://wittline.github.io/SparkSQL-with-Python/images/api2.PNG)\n\n\n\n![alt text](https://wittline.github.io/SparkSQL-with-Python/images/api3.PNG)\n\n\n\n\n# Contributing and Feedback\nAny ideas or feedback about this repository?. Help me to improve it.\n\n# Authors\n- Created by \u003ca href=\"https://www.linkedin.com/in/ramsescoraspe\"\u003e\u003cstrong\u003eRamses Alexander Coraspe Valdez\u003c/strong\u003e\u003c/a\u003e\n- Created on 2020\n\n# License\nThis project is licensed under the terms of the MIT license.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwittline%2Fsparksql-with-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwittline%2Fsparksql-with-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwittline%2Fsparksql-with-python/lists"}