{"id":28322638,"url":"https://github.com/binhfdv/ds200.l21_bigdata","last_synced_at":"2025-06-23T21:31:32.138Z","repository":{"id":47496045,"uuid":"368039415","full_name":"binhfdv/DS200.L21_BigData","owner":"binhfdv","description":null,"archived":false,"fork":false,"pushed_at":"2021-08-29T03:22:43.000Z","size":12224,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-01T22:44:48.122Z","etag":null,"topics":["big-data","data-preprocessing","machinelearning-python","pyspark"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/binhfdv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-05-17T03:04:40.000Z","updated_at":"2022-02-27T13:38:46.000Z","dependencies_parsed_at":"2022-09-06T05:22:16.536Z","dependency_job_id":null,"html_url":"https://github.com/binhfdv/DS200.L21_BigData","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/binhfdv/DS200.L21_BigData","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binhfdv%2FDS200.L21_BigData","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binhfdv%2FDS200.L21_BigData/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binhfdv%2FDS200.L21_BigData/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binhfdv%2FDS200.L21_BigData/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/binhfdv","download_url":"https://codeload.github.com/binhfdv/DS200.L21_BigData/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binhfdv%2FDS200.L21_BigData/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261558807,"owners_count":23177095,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-preprocessing","machinelearning-python","pyspark"],"created_at":"2025-05-25T14:10:41.267Z","updated_at":"2025-06-23T21:31:32.129Z","avatar_url":"https://github.com/binhfdv.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DS200.L21 / Big data\n\n## About\n\n* This is a college course project about applying big data tools to solve real-life problems.\n* The project is to utilize Apache-spark to predict classify the credit score.\n\n## Table of contents\n\n\u003e * [DS200.L21 / Big data](#DS200.L21--Big-data)\n* [About](#about)\n* [Table of contents](#table-of-contents)\n* [Data source](#data-source)\n* [Experiment pipelines](#experiment-pipelines)\n* [Feature extraction pipelines](#feature-extraction-pipelines)\n* [Code](#code)\n* [Presentation slides and Report](#presentation-slides-and-report)\n* [Reference](#reference)\n\n## Data source\n\n* \u003ca href=\"https://www.kaggle.com/cuongvc93/klps-creditscring-challenge-for-students\" target=\"_blank\"\u003eklp's creditscring challenge for students\u003c/a\u003e\n\n\n## Experiment pipelines\n![](images/experimentalprocedure.png)\n\n## Feature extraction pipelines\n![](images/TransPipeline.png)\n\n## Code\n\n* Feature extraction and models training (and so on) in this repo are implemented in Google Colab.\n* All codes are organized in `name.ipynb` files.\n\n## Presentation slides and Report\n\n* \u003ca href=\"https://github.com/githubbinh/DS200.L21_BigData/blob/master/report_slides.pdf\" target=\"_blank\"\u003ereport_slides.pdf\u003c/a\u003e\n* \u003ca href=\"https://github.com/githubbinh/DS200.L21_BigData/blob/master/report.pdf\" target=\"_blank\"\u003ereport.pdf\u003c/a\u003e\n\n## References\n\n* \u003ca href=\"https://link.springer.com/chapter/10.1007%2F978-3-030-79463-7_48\" target=\"_blank\"\u003eMachine Learning-Based Empirical Investigation for Credit Scoring in Vietnam’s Banking\u003c/a\u003e\n* \u003ca href=\"https://spark.apache.org/docs/1.2.2/ml-guide.html\" target=\"_blank\"\u003eSpark ML Programming Guide\u003c/a\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinhfdv%2Fds200.l21_bigdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbinhfdv%2Fds200.l21_bigdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinhfdv%2Fds200.l21_bigdata/lists"}