An open API service indexing awesome lists of open source software.

https://github.com/binhfdv/ds200.l21_bigdata


https://github.com/binhfdv/ds200.l21_bigdata

big-data data-preprocessing machinelearning-python pyspark

Last synced: 9 months ago
JSON representation

Awesome Lists containing this project

README

          

# DS200.L21 / Big data

## About

* This is a college course project about applying big data tools to solve real-life problems.
* The project is to utilize Apache-spark to predict classify the credit score.

## Table of contents

> * [DS200.L21 / Big data](#DS200.L21--Big-data)
* [About](#about)
* [Table of contents](#table-of-contents)
* [Data source](#data-source)
* [Experiment pipelines](#experiment-pipelines)
* [Feature extraction pipelines](#feature-extraction-pipelines)
* [Code](#code)
* [Presentation slides and Report](#presentation-slides-and-report)
* [Reference](#reference)

## Data source

* klp's creditscring challenge for students

## Experiment pipelines
![](images/experimentalprocedure.png)

## Feature extraction pipelines
![](images/TransPipeline.png)

## Code

* Feature extraction and models training (and so on) in this repo are implemented in Google Colab.
* All codes are organized in `name.ipynb` files.

## Presentation slides and Report

* report_slides.pdf
* report.pdf

## References

* Machine Learning-Based Empirical Investigation for Credit Scoring in Vietnam’s Banking
* Spark ML Programming Guide