Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dwija12903/bda-lab
This repository contains various lab files from my Big Data Analytics coursework
https://github.com/dwija12903/bda-lab
graphx networkx pysaprk-sql pyspark pyspark-mllib scala
Last synced: 24 days ago
JSON representation
This repository contains various lab files from my Big Data Analytics coursework
- Host: GitHub
- URL: https://github.com/dwija12903/bda-lab
- Owner: dwija12903
- Created: 2024-09-09T10:32:07.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-09-09T10:36:37.000Z (about 2 months ago)
- Last Synced: 2024-10-01T15:59:15.798Z (about 1 month ago)
- Topics: graphx, networkx, pysaprk-sql, pyspark, pyspark-mllib, scala
- Language: Jupyter Notebook
- Homepage:
- Size: 267 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 📊 Big Data Analytics Lab Files
This repository contains various lab files from my **Big Data Analytics** coursework, covering topics such as **Scala**, **PySpark**, **RDD**, **SQL**, **GraphX**, **NetworkX**, **PageRank**, **Linear Regression**, and **Random Forest**.
### 💻 Lab Files Included:
1. **Scala Basics** 🚀
- Scala programming language fundamentals for big data.
2. **RDD (Resilient Distributed Dataset)** ⚙️
- Operations and transformations on RDDs in PySpark and Scala.
3. **SQL in PySpark** 🧮
- Performing SQL operations on big data using PySpark SQL.
4. **PySpark** 🔥
- Handling distributed data processing using PySpark.5. **GraphX** 🔗
- Working with graph data using GraphX in Scala.
6. **NetworkX** 🌐
- Graph-based data analysis using NetworkX with PySpark.
7. **PageRank Algorithm** 📈
- Implementation of the PageRank algorithm for ranking web pages using PySpark.
8. **Linear Regression** 📉
- Machine learning implementation of Linear Regression for predictive analysis using PySpark MLlib.
9. **Random Forest** 🌲
- Implementing the Random Forest algorithm for classification and regression tasks in PySpark MLlib.## 🚀 How to Run
1. Clone the repository:
```bash
git clone https://github.com//Big-Data-Analytics.git
cd Big-Data-Analytics
```2. Navigate to the respective folder and run the code using **PySpark** or **Scala**.
## ⚙️ Prerequisites
- **Scala** installed.
- **PySpark** environment set up.
- Python packages for **PySpark**, **NetworkX**, etc.## 👩💻 Contributing
Feel free to contribute by improving the code, adding new features, or enhancing documentation. Fork the repository, create a branch, and submit a pull request.