Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lisabensoussan/bigdata_midterm

This project focuses on analyzing Stack Overflow data related to JavaScript and Python questions using a combination of SQL queries (Google BigQuery) and Unix shell commands. The aim is to explore trends, activity patterns, and user behavior around these popular programming languages through data wrangling and querying techniques.
https://github.com/lisabensoussan/bigdata_midterm

bigquery data-cleaning sql unix-command unix-shell

Last synced: about 1 month ago
JSON representation

This project focuses on analyzing Stack Overflow data related to JavaScript and Python questions using a combination of SQL queries (Google BigQuery) and Unix shell commands. The aim is to explore trends, activity patterns, and user behavior around these popular programming languages through data wrangling and querying techniques.

Awesome Lists containing this project

README

        

# Big Data Mining

## MidTerm 52019/52002 2023-24

### Students :
- Lisa Mechaly Bensoussan
- Jeremy Hakoune

### Project Overview:
This project consists of SQL queries and Unix shell commands to analyze large datasets from Stack Overflow using Google BigQuery and Unix-based operations. The tasks focus on extracting, cleaning, and analyzing data related to JavaScript and Python questions on Stack Overflow. The goal is to explore patterns, trends, and user behavior related to these programming languages through querying and data wrangling.

### Tasks:

1. **SQL Queries using BigQuery**:
- Querying Stack Overflow data to identify the most popular JavaScript-related posts.
- Statistical analysis on post activity across different days of the week.
- Cross-analysis between JavaScript and Python-related questions.

2. **Unix Shell Commands**:
- Working with large text files from Stack Overflow.
- Counting words and handling large-scale text data using shell commands and Python scripts.
- Splitting files based on years and performing word frequency analysis.

### Proof of Work:
For each task, appropriate SQL queries, shell commands, and Python scripts are provided, along with outputs to demonstrate the correctness of the solutions.

### Remarks:
- All shell commands and SQL queries were run using the provided dataset.
- The data was cleaned and processed to handle commas and special characters for accurate results.