Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wittline/sparksql-with-python
This repository has some examples of using Spark and SparkSQL with Python through PySpark
https://github.com/wittline/sparksql-with-python
flask-api python spark sparksql
Last synced: 12 days ago
JSON representation
This repository has some examples of using Spark and SparkSQL with Python through PySpark
- Host: GitHub
- URL: https://github.com/wittline/sparksql-with-python
- Owner: Wittline
- Created: 2020-09-01T00:44:08.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-11-25T05:12:47.000Z (about 4 years ago)
- Last Synced: 2024-12-01T08:21:46.060Z (2 months ago)
- Topics: flask-api, python, spark, sparksql
- Language: HTML
- Homepage: https://wittline.github.io/SparkSQL-with-Python/
- Size: 3.94 MB
- Stars: 2
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: docs/Readme.md
Awesome Lists containing this project
README
# SparkSQL with Python
This repository has some examples of using Spark and SparkSQL with Python through PySpark
## Profeco
We will work with the Profeco dataset, which you can download here: [Profeco](https://drive.google.com/uc?export=download&id=0B-4W2dww7ELNazFfOFVhNG5vckE) , is a daily historical record of more than 2,000 products, as of 2015, in various establishments in Mexico
* How many records are there?
* How many categories are there?
* How many trade chains are being monitored (and therefore reported in that database)?
* What are the most monitored products in each state of the country?
* What is the trade chain with the greatest variety of monitored products?## Countries airports
## API to count the number of tweets in a radius of 1km
I will separate in another file "tweets_geo.csv" all the different tweets with their geographic data information, this will help in the manipulation of this data in a query with sparkSQL
Check the data preparation code here
The details of the code for the API REST is in the folder API in this repository
![alt text](https://wittline.github.io/SparkSQL-with-Python/images/api1.PNG)
![alt text](https://wittline.github.io/SparkSQL-with-Python/images/api2.PNG)
![alt text](https://wittline.github.io/SparkSQL-with-Python/images/api3.PNG)
# Contributing and Feedback
Any ideas or feedback about this repository?. Help me to improve it.# Authors
- Created by Ramses Alexander Coraspe Valdez
- Created on 2020# License
This project is licensed under the terms of the MIT license.