Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dime-worldbank/colombiatransmilenio
https://github.com/dime-worldbank/colombiatransmilenio
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/dime-worldbank/colombiatransmilenio
- Owner: dime-worldbank
- Created: 2023-09-08T03:01:48.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-05T14:00:16.000Z (about 2 months ago)
- Last Synced: 2024-11-05T14:49:51.054Z (about 2 months ago)
- Language: Python
- Size: 144 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Bogotá TuLlave Smartcard Data Analysis
## Files structure
## Code structure
#### 1. Download newest data: `data-fetch`
- From TransMilenio GCloud API
- Job that automatically runs all Mondays
#### 2. Put together old and new data: `data-organize`
- Create a Workspace folder and move data from:
- Ingestion Point (Documents folder)
- Downloads Point (Data folder)
- _TBC data before 2020_
#### 3. Clean data: `data-clean`
- Uniform structure for all data
1. Classify raw data files based on **headers**, and reorganize them in folders by header.
2. Each header has a specific format. Import them with different spark_handlers to apply the right transformations to each.- _TBC data before 2020_
## Questions to ask TM:
- Some dates have "UTC" at the end of it and some others don't. Can we assume they are in UTC time as well? Or shall we assume that they are in Colombia time?
- Can the same card number, if not used for a while, be later assigned to another person