Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lisabensoussan/bigdata_midterm
This project focuses on analyzing Stack Overflow data related to JavaScript and Python questions using a combination of SQL queries (Google BigQuery) and Unix shell commands. The aim is to explore trends, activity patterns, and user behavior around these popular programming languages through data wrangling and querying techniques.
https://github.com/lisabensoussan/bigdata_midterm
bigquery data-cleaning sql unix-command unix-shell
Last synced: about 1 month ago
JSON representation
This project focuses on analyzing Stack Overflow data related to JavaScript and Python questions using a combination of SQL queries (Google BigQuery) and Unix shell commands. The aim is to explore trends, activity patterns, and user behavior around these popular programming languages through data wrangling and querying techniques.
- Host: GitHub
- URL: https://github.com/lisabensoussan/bigdata_midterm
- Owner: lisabensoussan
- Created: 2024-09-11T10:18:30.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-09-11T10:52:07.000Z (2 months ago)
- Last Synced: 2024-09-30T10:41:31.995Z (about 2 months ago)
- Topics: bigquery, data-cleaning, sql, unix-command, unix-shell
- Homepage:
- Size: 230 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Big Data Mining
## MidTerm 52019/52002 2023-24
### Students :
- Lisa Mechaly Bensoussan
- Jeremy Hakoune### Project Overview:
This project consists of SQL queries and Unix shell commands to analyze large datasets from Stack Overflow using Google BigQuery and Unix-based operations. The tasks focus on extracting, cleaning, and analyzing data related to JavaScript and Python questions on Stack Overflow. The goal is to explore patterns, trends, and user behavior related to these programming languages through querying and data wrangling.### Tasks:
1. **SQL Queries using BigQuery**:
- Querying Stack Overflow data to identify the most popular JavaScript-related posts.
- Statistical analysis on post activity across different days of the week.
- Cross-analysis between JavaScript and Python-related questions.2. **Unix Shell Commands**:
- Working with large text files from Stack Overflow.
- Counting words and handling large-scale text data using shell commands and Python scripts.
- Splitting files based on years and performing word frequency analysis.### Proof of Work:
For each task, appropriate SQL queries, shell commands, and Python scripts are provided, along with outputs to demonstrate the correctness of the solutions.### Remarks:
- All shell commands and SQL queries were run using the provided dataset.
- The data was cleaned and processed to handle commas and special characters for accurate results.