{"id":22613021,"url":"https://github.com/leonism/dataiku-fifa","last_synced_at":"2025-10-08T23:21:21.019Z","repository":{"id":96669524,"uuid":"275495505","full_name":"leonism/dataiku-FIFA","owner":"leonism","description":"Within this repository, delve into a treasure trove of my personal projects spanning Machine Learning, exploratory data analysis (EDA), Python Jupyter Notebooks, and an assortment of visualizations crafted using Dataiku Platform's exported standard files.","archived":false,"fork":false,"pushed_at":"2020-06-28T04:04:19.000Z","size":4910,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-28T23:46:50.559Z","etag":null,"topics":["data-science","dataiku","machine-learning","machine-learning-algorithms","machinelearning"],"latest_commit_sha":null,"homepage":"https://leonism.github.io/dataiku-FIFA/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/leonism.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-28T03:02:00.000Z","updated_at":"2024-09-27T22:36:38.000Z","dependencies_parsed_at":null,"dependency_job_id":"2cfa040c-d63d-4585-8d39-567225582dc3","html_url":"https://github.com/leonism/dataiku-FIFA","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/leonism/dataiku-FIFA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonism%2Fdataiku-FIFA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonism%2Fdataiku-FIFA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonism%2Fdataiku-FIFA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonism%2Fdataiku-FIFA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/leonism","download_url":"https://codeload.github.com/leonism/dataiku-FIFA/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonism%2Fdataiku-FIFA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000735,"owners_count":26082862,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","dataiku","machine-learning","machine-learning-algorithms","machinelearning"],"created_at":"2024-12-08T17:14:47.741Z","updated_at":"2025-10-08T23:21:21.014Z","avatar_url":"https://github.com/leonism.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"![FIFA Logo](/images/fifa.png)\n\n# Dataiku's Conundrum Challenge on FIFA Dataset.\n\n# Introduction\nIf there's one thing similar about an interesting dataset and a good football's match, is that they're all keeping everyone's safe at home during this time of the pandemic. And in all honesty, I'm no data-scientist nor a dev guy. I just recently got myself exposed to a Machine Learning/Artificial Intelligent in general, while Dataiku in particular -- somewhere a little over then 3 months ago, but anyway here's my take to this conundrum's challenge.\n\n# Installation\nOn this repository, you may find my personal projects related to Machine Learning, EDA, Python Jupyter Notebook and couple of Visualization based on the Dataiku Platform exported standard files. Most of the datasets I've been working with, downloaded from Conundrum site. Installation pretty straight forward. Simply download the whole set as a single project and as a ZIP file, everything have been flattened out with plain text files, and no SQL dump was involved, so there wouldn't be any missing system dependencies issue. Simply imported the downloaded Zip file to your working project.\n\n# Jupyter Notebooks\n- [Correlations analysis on Conundrum_13_Data_prepared (admin).ipynb](https://github.com/leonism/dataiku-FIFA/blob/master/ipython_notebooks/Correlations%20analysis%20on%20Conundrum_13_Data_prepared%20(admin).ipynb) \n- [Correlations analysis on Conundrum_13_Data_prepared_scored (admin).ipynb](https://github.com/leonism/dataiku-FIFA/blob/master/ipython_notebooks/Correlations%20analysis%20on%20Conundrum_13_Data_prepared_scored%20(admin).ipynb) \n- [High dimensional data visualization (t-SNE) on Conundrum_13_Data_prepared_scored (admin).ipynb](https://github.com/leonism/dataiku-FIFA/blob/master/ipython_notebooks/High%20dimensional%20data%20visualization%20(t-SNE)%20on%20Conundrum_13_Data_prepared_scored%20(admin).ipynb)\n- [PCA on Conundrum_13_Data_prepared_scored (admin).ipynb](https://github.com/leonism/dataiku-FIFA/blob/master/ipython_notebooks/PCA%20on%20Conundrum_13_Data_prepared_scored%20(admin).ipynb)\n- [Statistics and tests on a single population on Conundrum_13_Data_prepared_scored (admin).ipynb](https://github.com/leonism/dataiku-FIFA/blob/master/ipython_notebooks/Statistics%20and%20tests%20on%20a%20single%20population%20on%20Conundrum_13_Data_prepared_scored%20(admin).ipynb)\n- [Statistics and tests on multiple populations on Conundrum_13_Data_prepared_scored (admin).ipynb](https://github.com/leonism/dataiku-FIFA/blob/master/ipython_notebooks/Statistics%20and%20tests%20on%20multiple%20populations%20on%20Conundrum_13_Data_prepared_scored%20(admin).ipynb)\n- [Topic modeling on Conundrum_13_Data_prepared_scored (admin).ipynb](https://github.com/leonism/dataiku-FIFA/blob/master/ipython_notebooks/Topic%20modeling%20on%20Conundrum_13_Data_prepared_scored%20(admin).ipynb)\n\n# Data Flow\nAnd since the challenge is not to 'predict' anything, rather to group/cluster the player's skillsets in reflect to their wages rate. Here's what my current flow would look like, and don't bother much on the 2 additional datasets, as they're merely exported from the existing model, so that I may explore them further.\n![main-flow.png](/images/main-flow.png)\n\n# Prepare Recipes\nAnd here's how I go about on the prepare recipes, nothing out of the ordinary. Just converting *categorical* to *numerical* values with one-hot encoding and filling up the *'NaN'* with median values, while grouping them to have better clarity, if ever I need to go back and revise anything.\n![recipes-prepared.jpg](/images/recipes-prepared.jpg)\n\n# Modeling \u0026 Training \nWhile on modeling/training steps, I choose the '_Interactive Clustering_' which in return, delivered me a sufficient scoring value.\n![model-score.png](/images/model-score.png)\n\n\n# Clustering Classification\nOn to the clustering variables name, I simply identify them in the grading manner, starting from '_Grading A_', as the most top-knot performer, all the way down to the least performing one marked with '_Grading E_'.\n![clusttering.jpg](/images/clusttering.jpg)\n\n\n# Cluster Plot\nAnd here's how my _cluster plot_ would look like, obviously the better the grade, the least volume of players getting included in them.\n\n**Acceleration x Wage**\n![scatter-plot-a.png](/images/scatter-plot-a.png)\n\n**Sliding Tackle x Wage**\n![scateter-plot-b.png](/images/scateter-plot-b.png)\n\n# Variables Significant Level\nAnd for sure those who sits at the 'Grading A' level would stand above the average threshold (though, that's not always the case with other variables).\u003cbr /\u003e\u003cbr /\u003e\n![grading-a-variables.jpg](/images/grading-a-variables.jpg)\n\n\n# Value Proposition\nAnd coming back again to the initial question,  _\"creating a flow that outputs a value proposition in term of their wages\"_. I think I didn't include the players name and their nationalities in my modeling for a couple of reasons. In my opinions, those two variables are just too subjective to get included. In a sense that you could be a top-knot player, regardless of what your 'Names' would sound like, and of course your 'Nationalities'.\n\nSo I've done the DSS flow diagram, while the followings are my list of 'value proposition' that contributed of being one '_Grading-A_' player in the field.\n\n**Top 5 Values Proposition**\n![fig1.png](/images/fig1.png)\n\n**Top 5 Values Proposition By Distribution.**\n![fig3.png](/images/fig2.png)\n\n**Top 5 Values Proposition By Grade.**\n![fig3.png](/images/fig3.png)\n\n# Correlation Matrix\n\nThe very first correlation analysis consists of plotting the \"Correlation matrix\" for numerical variables. For each couple of numerical variables, this computes the \"strength\" of the correlation (called the Pearson coefficient):\n\n- 1.0 means a perfect correlation\n- 0.0 means no correlation\n- -1.0 means a perfect \"inverse\" correlation\n\nSince it does not really make sense to print this correlation plot for hundred of variables, we are restricting it to the first 50 numerical variables of the dataset.\n![download-1.png](/images/download-1.png)\n![download.png](/images/download.png)\n\n\nBeen enjoying exploring this dataset for sure, and certainly it was fun doing it, stays safe everyone! 😊\n\n\n# Disclaimer\nAnd please remember, as this is only a weekend pet project, which I'm doing them for my personal interest only.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleonism%2Fdataiku-fifa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fleonism%2Fdataiku-fifa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleonism%2Fdataiku-fifa/lists"}