Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dwija12903/bda-lab

This repository contains various lab files from my Big Data Analytics coursework
https://github.com/dwija12903/bda-lab

graphx networkx pysaprk-sql pyspark pyspark-mllib scala

Last synced: 24 days ago
JSON representation

This repository contains various lab files from my Big Data Analytics coursework

Host: GitHub
URL: https://github.com/dwija12903/bda-lab
Owner: dwija12903
Created: 2024-09-09T10:32:07.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2024-09-09T10:36:37.000Z (about 2 months ago)
Last Synced: 2024-10-01T15:59:15.798Z (about 1 month ago)
Topics: graphx, networkx, pysaprk-sql, pyspark, pyspark-mllib, scala
Language: Jupyter Notebook
Homepage:
Size: 267 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 📊 Big Data Analytics Lab Files

This repository contains various lab files from my **Big Data Analytics** coursework, covering topics such as **Scala**, **PySpark**, **RDD**, **SQL**, **GraphX**, **NetworkX**, **PageRank**, **Linear Regression**, and **Random Forest**.

### 💻 Lab Files Included:

1. **Scala Basics** 🚀
- Scala programming language fundamentals for big data.

2. **RDD (Resilient Distributed Dataset)** ⚙️
- Operations and transformations on RDDs in PySpark and Scala.

3. **SQL in PySpark** 🧮
- Performing SQL operations on big data using PySpark SQL.

4. **PySpark** 🔥
- Handling distributed data processing using PySpark.

5. **GraphX** 🔗
- Working with graph data using GraphX in Scala.

6. **NetworkX** 🌐
- Graph-based data analysis using NetworkX with PySpark.

7. **PageRank Algorithm** 📈
- Implementation of the PageRank algorithm for ranking web pages using PySpark.

8. **Linear Regression** 📉
- Machine learning implementation of Linear Regression for predictive analysis using PySpark MLlib.

9. **Random Forest** 🌲
- Implementing the Random Forest algorithm for classification and regression tasks in PySpark MLlib.

## 🚀 How to Run

1. Clone the repository:
```bash
git clone https://github.com//Big-Data-Analytics.git
cd Big-Data-Analytics
```

2. Navigate to the respective folder and run the code using **PySpark** or **Scala**.

## ⚙️ Prerequisites

- **Scala** installed.
- **PySpark** environment set up.
- Python packages for **PySpark**, **NetworkX**, etc.

## 👩‍💻 Contributing

Feel free to contribute by improving the code, adding new features, or enhancing documentation. Fork the repository, create a branch, and submit a pull request.