https://github.com/victor-antoniassi/junior_analytics_engineer_test_01
Solution built for a Junior Analytics Engineer technical assessment
https://github.com/victor-antoniassi/junior_analytics_engineer_test_01
analytics-engineer code-challenge data-engineering etl hiring-challenge practical-test python sql sqlite
Last synced: 17 days ago
JSON representation
Solution built for a Junior Analytics Engineer technical assessment
- Host: GitHub
- URL: https://github.com/victor-antoniassi/junior_analytics_engineer_test_01
- Owner: victor-antoniassi
- License: gpl-3.0
- Created: 2024-03-21T19:41:12.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2024-11-27T21:14:46.000Z (over 1 year ago)
- Last Synced: 2024-11-27T22:21:03.369Z (over 1 year ago)
- Topics: analytics-engineer, code-challenge, data-engineering, etl, hiring-challenge, practical-test, python, sql, sqlite
- Language: Python
- Homepage:
- Size: 34 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# School Supplies Market Data Preparation
Solution built for a Junior Analytics Engineer technical assessment. Read the complete challenge proposal [here](technical_challenge_proposal.md).
## 📊 About
Solution developed for a technical assessment that prepared educational data. The project processed data from:
- Student profiles (2021-2022)
- School information (2021-2022)
- São Paulo city only
## 🛠️ Technical Stack
- Python
- Pandas
- SQLite
- Google Sheets
## 🔄 Data Pipeline
1. **Quality Check**
- Analyzed data quality in CSV files
- Used Python, Pandas and Google Sheets
2. **Header Validation**
- Developed a [Python script](data_preparation/scripts/compare_delimited_file_headers) to validate file headers
- Compared against data dictionaries
3. **Manual Corrections**
- Fixed field names
- Adjusted column positions
- Removed empty or invalid columns
4. **Data Preparation**
- Created a [Python script](data_preparation/scripts/datasets_to_sqlite) for data cleaning
- Prepared data for SQLite storage
5. **Database Creation**
- Created [SQLite database](sqlite_db) with tables:
- `educandos`: Student profiles (2021-2022)
- `escolas`: Municipal schools data (2021-2022)
- `escolas_educandos`: Relationship between schools and students
## 📈 Analysis Opportunities
The prepared database enabled various analyses for sales planning:
1. **Demographics**: Student distribution by race, gender, and age
- Helped understand customer diversity
- Guided product development
2. **Special Education**: Distribution of students with special needs
- Identified opportunities for specialized products
- Supported inclusive product planning
3. **Trends**: Year-over-year comparison (2021-2022)
- Helped predict future demand
- Guided inventory planning
4. **School Clusters**: Groups of schools by characteristics
- Location-based analysis
- Size-based segmentation
5. **Market Segments**: Identified distinct customer groups
- Customized product strategies
- Targeted marketing approaches
---
*Note: This project was developed as part of a technical assessment for a Junior Analytics Engineer position. Some details have been modified to maintain confidentiality.*