https://github.com/binhfdv/ds200.l21_bigdata

big-data data-preprocessing machinelearning-python pyspark

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/binhfdv/ds200.l21_bigdata
Owner: binhfdv
Created: 2021-05-17T03:04:40.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2021-08-29T03:22:43.000Z (almost 5 years ago)
Last Synced: 2025-06-01T22:44:48.122Z (about 1 year ago)
Topics: big-data, data-preprocessing, machinelearning-python, pyspark
Language: Jupyter Notebook
Homepage:
Size: 11.7 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # DS200.L21 / Big data

## About

* This is a college course project about applying big data tools to solve real-life problems.

* The project is to utilize Apache-spark to predict classify the credit score.

## Table of contents

> * [DS200.L21 / Big data](#DS200.L21--Big-data)

* [About](#about)

* [Table of contents](#table-of-contents)

* [Data source](#data-source)

* [Experiment pipelines](#experiment-pipelines)

* [Feature extraction pipelines](#feature-extraction-pipelines)

* [Code](#code)

* [Presentation slides and Report](#presentation-slides-and-report)

* [Reference](#reference)

## Data source

* klp's creditscring challenge for students

## Experiment pipelines

![](images/experimentalprocedure.png)

## Feature extraction pipelines

![](images/TransPipeline.png)

## Code

* Feature extraction and models training (and so on) in this repo are implemented in Google Colab.

* All codes are organized in `name.ipynb` files.

## Presentation slides and Report

* report_slides.pdf

* report.pdf

## References

* Machine Learning-Based Empirical Investigation for Credit Scoring in Vietnam’s Banking

* Spark ML Programming Guide

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/binhfdv/ds200.l21_bigdata

Awesome Lists containing this project

README