{"id":15493105,"url":"https://github.com/pchampio/egc-2020","last_synced_at":"2025-03-28T16:43:25.227Z","repository":{"id":80616597,"uuid":"153779960","full_name":"pchampio/EGC-2020","owner":"pchampio","description":" :mag:  :chart_with_upwards_trend: Using techniques of knowledge discovery and text mining the goal is to explain the structure and the evolution of the EGC community","archived":false,"fork":false,"pushed_at":"2018-11-22T08:52:27.000Z","size":18957,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-02T17:18:28.386Z","etag":null,"topics":["data-mining","knowledge-discovery","univ-lemans"],"latest_commit_sha":null,"homepage":"http://www.egc.asso.fr/manifestations/defi-egc/defi-egc-2020-20-ans-dhistoire-pour-quel-avenir.html","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pchampio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-19T12:33:31.000Z","updated_at":"2019-02-27T15:10:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"cb8fadea-26c1-4555-b8f0-b703d1a0ca56","html_url":"https://github.com/pchampio/EGC-2020","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pchampio%2FEGC-2020","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pchampio%2FEGC-2020/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pchampio%2FEGC-2020/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pchampio%2FEGC-2020/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pchampio","download_url":"https://codeload.github.com/pchampio/EGC-2020/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246068266,"owners_count":20718501,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-mining","knowledge-discovery","univ-lemans"],"created_at":"2024-10-02T08:04:23.167Z","updated_at":"2025-03-28T16:43:25.184Z","avatar_url":"https://github.com/pchampio.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EGC_2020\nEGC 2020 Challenge: 20 years of history for which future?\n\nThe goal of this challenge is to take stock at the evolution of the EGC community over the past 20\nyears and try to predict the future. The principle is to apply techniques of knowledge discovery and\ndata mining to explain the structure and evolution.\n\n## Dataset\n\nThe data set consists of 1200 titles and abstracts from the articles published at the EGC conference between 2004 and 2018.  \nFields:\n  - years\n  - title\n  - abstract\n  - authors\n\n\n## Pipeline\n\n- filter_extreme\n- tf-idf\n- LDA (Coherence Score)\n- K-Means (Silhouette scores)\n\n## Cluster (Topics) evolution / Time\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://raw.githubusercontent.com/Drakirus/EGC_2020/master/plots/distribution.png\"\u003e\n    \u003cimg alt=\"ScreenShot\" src=\"https://raw.githubusercontent.com/Drakirus/EGC_2020/master/plots/distribution.png\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nOur system deducted a sharp increase in articles related to the social network analysis over the past years 20 (Label 1).  \nOn the other hand, rule-based algorithms seem to have declined drastically (Label 6).\n\n## Evaluation (Hyper-parameters defined in the [Jupiter-notebook](./EGC.ipynb))\nThe pipeline used in this project doesn't seem to find a lot of structure for one cluster (Label 9), sadly this cluster represents ~30% of our training data (Silhouette plot below).\n\u003cdetails\u003e\n\u003csummary\u003eSilhouette plot for 10 clusters\u003c/summary\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://raw.githubusercontent.com/Drakirus/EGC_2020/master/plots/silhouette.png\"\u003e\n    \u003cimg alt=\"ScreenShot\" src=\"https://raw.githubusercontent.com/Drakirus/EGC_2020/master/plots/silhouette.png\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003c/details\u003e\n\n#### There is still room for improvement. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpchampio%2Fegc-2020","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpchampio%2Fegc-2020","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpchampio%2Fegc-2020/lists"}