An open API service indexing awesome lists of open source software.

https://github.com/pchampio/egc-2020

:mag: :chart_with_upwards_trend: Using techniques of knowledge discovery and text mining the goal is to explain the structure and the evolution of the EGC community
https://github.com/pchampio/egc-2020

data-mining knowledge-discovery univ-lemans

Last synced: 3 months ago
JSON representation

:mag: :chart_with_upwards_trend: Using techniques of knowledge discovery and text mining the goal is to explain the structure and the evolution of the EGC community

Awesome Lists containing this project

README

        

# EGC_2020
EGC 2020 Challenge: 20 years of history for which future?

The goal of this challenge is to take stock at the evolution of the EGC community over the past 20
years and try to predict the future. The principle is to apply techniques of knowledge discovery and
data mining to explain the structure and evolution.

## Dataset

The data set consists of 1200 titles and abstracts from the articles published at the EGC conference between 2004 and 2018.
Fields:
- years
- title
- abstract
- authors

## Pipeline

- filter_extreme
- tf-idf
- LDA (Coherence Score)
- K-Means (Silhouette scores)

## Cluster (Topics) evolution / Time


 
ScreenShot

Our system deducted a sharp increase in articles related to the social network analysis over the past years 20 (Label 1).
On the other hand, rule-based algorithms seem to have declined drastically (Label 6).

## Evaluation (Hyper-parameters defined in the [Jupiter-notebook](./EGC.ipynb))
The pipeline used in this project doesn't seem to find a lot of structure for one cluster (Label 9), sadly this cluster represents ~30% of our training data (Silhouette plot below).

Silhouette plot for 10 clusters


 
ScreenShot

#### There is still room for improvement.