https://github.com/binhfdv/ds200.l21_bigdata
https://github.com/binhfdv/ds200.l21_bigdata
big-data data-preprocessing machinelearning-python pyspark
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/binhfdv/ds200.l21_bigdata
- Owner: binhfdv
- Created: 2021-05-17T03:04:40.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-08-29T03:22:43.000Z (over 4 years ago)
- Last Synced: 2025-06-01T22:44:48.122Z (10 months ago)
- Topics: big-data, data-preprocessing, machinelearning-python, pyspark
- Language: Jupyter Notebook
- Homepage:
- Size: 11.7 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DS200.L21 / Big data
## About
* This is a college course project about applying big data tools to solve real-life problems.
* The project is to utilize Apache-spark to predict classify the credit score.
## Table of contents
> * [DS200.L21 / Big data](#DS200.L21--Big-data)
* [About](#about)
* [Table of contents](#table-of-contents)
* [Data source](#data-source)
* [Experiment pipelines](#experiment-pipelines)
* [Feature extraction pipelines](#feature-extraction-pipelines)
* [Code](#code)
* [Presentation slides and Report](#presentation-slides-and-report)
* [Reference](#reference)
## Data source
* klp's creditscring challenge for students
## Experiment pipelines

## Feature extraction pipelines

## Code
* Feature extraction and models training (and so on) in this repo are implemented in Google Colab.
* All codes are organized in `name.ipynb` files.
## Presentation slides and Report
* report_slides.pdf
* report.pdf
## References
* Machine Learning-Based Empirical Investigation for Credit Scoring in Vietnam’s Banking
* Spark ML Programming Guide