Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yyassif/docker-spark-hadoop-for-recommendation

Using All Big Data Technologies In Order To Apply ALS Algorithm To Recommend Amazon Prodcuts
https://github.com/yyassif/docker-spark-hadoop-for-recommendation

Last synced: 28 days ago
JSON representation

Using All Big Data Technologies In Order To Apply ALS Algorithm To Recommend Amazon Prodcuts

Host: GitHub
URL: https://github.com/yyassif/docker-spark-hadoop-for-recommendation
Owner: yyassif
Created: 2023-12-21T22:51:02.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-08-01T20:34:25.000Z (5 months ago)
Last Synced: 2024-10-15T03:52:47.612Z (2 months ago)
Language: Scala
Size: 25.4 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# How to use HDFS/Spark

## Installation

### Base Images

```sh
docker pull bde2020/hadoop-namenode:1.1.0-hadoop2.8-java8
docker pull bde2020/hadoop-datanode:1.1.0-hadoop2.8-java8
docker pull bde2020/spark-base:2.1.0-hadoop2.8-hive-java8
docker pull bde2020/spark-master:2.1.0-hadoop2.8-hive-java8
docker pull bde2020/spark-worker:2.1.0-hadoop2.8-hive-java8
docker pull bde2020/spark-notebook:2.1.0-hadoop2.8-hive
docker pull bde2020/hdfs-filebrowser:3.11
```

### Docker Compose File

To start an HDFS/Spark Workbench, run:

```sh
docker-compose up -d
```

## Interfaces

- Namenode: http://localhost:50070
- Datanode: http://localhost:50075
- Spark-master: http://localhost:8080
- Spark-notebook: http://localhost:9001
- Hue (HDFS Filebrowser): http://localhost:8088/home

# Recommendation Spark Application

Jar file is packaged under the jarfile directory.

- Compiled with scala 2.11.11
- For Spark 2.1.0

## How to run (using Makefile)

To Run the make the make command I've come up with this order which seem very mandatory to properly have the job done.

### Start the Preprocessing

```
make prepare-raw-dataset
```

### Start the Data Ingestion into HDFS

```
make ingest-hdfs
```

### Create the Fat-JAR File

```
make jar
```

### Start the Prediction

```
make prediction
```

### Save the Results into a result directory

```
make prediction-result
```

### Clean the Output directory in HDFS

```
make clean-output
```

### Clean the Input directory in HDFS

```
make clean-input
```