https://github.com/stefen-taime/myubereats_datapipeline
Building a Modern Uber Eats Data Pipeline
https://github.com/stefen-taime/myubereats_datapipeline
airflow api data datawarehouse mongodb pipeline powerbi snowflake
Last synced: 7 months ago
JSON representation
Building a Modern Uber Eats Data Pipeline
- Host: GitHub
- URL: https://github.com/stefen-taime/myubereats_datapipeline
- Owner: Stefen-Taime
- Created: 2023-11-28T16:30:49.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-28T18:17:39.000Z (almost 2 years ago)
- Last Synced: 2025-01-17T05:09:55.564Z (9 months ago)
- Topics: airflow, api, data, datawarehouse, mongodb, pipeline, powerbi, snowflake
- Language: Python
- Homepage:
- Size: 308 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Building a Modern Uber Eats Data Pipeline
## Dependencies
- `google-auth`
- `google-auth-oauthlib`
- `google-auth-httplib2`
- `google-api-python-client`
- `pymongo`
- `schedule`You can install the necessary dependencies via pip:
```bash
pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client pymongo schedule
```
## Setup1. Create a project on the [Google Cloud Platform Console](https://console.cloud.google.com/).
2. Enable the Gmail API for your project.
3. Download the client configuration file `credentials.json` and place it in the root directory of this script.
4. Setup a MongoDB Atlas cluster and obtain the connection string.## Gmail Configuration Completion
Once the Gmail configuration is complete, let's face the challenge. It's unclear how Uber Eats email receipts look in different countries, but here is what mine looks like:

Ensure you have the same format; otherwise, you might encounter errors when attempting to retrieve Uber Eats emails. Also, if your Uber Eats emails are in English, replace the field names in French with their English equivalents in your receipts.
After launching the `getGmail.py` script, approximately 500 Uber Eats emails are retrieved. You can adjust this number. They are sent to MongoDB as documents resembling this:

The ultimate goal is to have a star schema model like this:

## Setting Up Environment Variables
Navigate to the `models` folder, create a `.env` file, and fill in these values:
## Configuring connections
To configure connections to MongoDB and Snowflake, follow the steps below.
### MongoDB connection detailsAdd these details to your configuration file or environment:
```plaintext
MONGODB_URI=mongodb+srv://username:password@atlascluster.xxxxxx.mongodb.net/admin?retryWrites=true&w=majoritySNOWFLAKE_USER=
SNOWFLAKE_PASSWORD=
SNOWFLAKE_ACCOUNT=xxxx.us-east-1
SNOWFLAKE_WAREHOUSE=
SNOWFLAKE_DATABASE=
SNOWFLAKE_SCHEMA=PUBLICNote: You can create free accounts on Snowflake and MongoDB Atlas to have ready-to-use servers.
```
## Testing ConnectionsOnce set up, launch the `mongodb_connection.py` and `snowflake_connection.py` files to test the connections. At this stage, everything should be okay.
## Creating Tables
Create the dimension tables by executing `create_table_snowflake.py`. Check in Snowflake to see the three dimension tables and the fact table created.
## Populating Tables
Execute `model.py` to populate the fact and dimension tables, preparing your dimensional model for analysis. You can use Power BI, Tableau, or create views for dashboarding.
## Manual Execution
For manual execution, follow the steps above.
## Automatic Execution with Airflow
For automatic execution:
1. Navigate to the `airflow_docker` folder.
2. Launch a local server by executing: `docker-compose up --build -d`.
3. Once ready, go to `localhost:8080` with username/password `airflow`.
4. Before anything else, add connections for Snowflake and MongoDB in the Admin tab.
5. Start the main DAG. If everything is okay, you should see an output similar to this: