https://github.com/younesbelkada/altegrad_challenge
Altegrad 2021-2022 - Citation Prediction Challenge - A complete guide and code to crack the citation prediction altegrad challenge - https://www.kaggle.com/c/altegrad-2021/ - MVA Masters program 2021-2022
https://github.com/younesbelkada/altegrad_challenge
article deep-learning feature-extraction graph-neural-networks huggingface-transformers keybert link-prediction sentence-transformers
Last synced: 6 months ago
JSON representation
Altegrad 2021-2022 - Citation Prediction Challenge - A complete guide and code to crack the citation prediction altegrad challenge - https://www.kaggle.com/c/altegrad-2021/ - MVA Masters program 2021-2022
- Host: GitHub
- URL: https://github.com/younesbelkada/altegrad_challenge
- Owner: younesbelkada
- License: apache-2.0
- Created: 2022-01-22T17:23:28.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-02-24T12:45:29.000Z (over 3 years ago)
- Last Synced: 2025-02-08T10:32:03.428Z (8 months ago)
- Topics: article, deep-learning, feature-extraction, graph-neural-networks, huggingface-transformers, keybert, link-prediction, sentence-transformers
- Language: Python
- Homepage:
- Size: 9.83 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Altegrad 2021-2022 - Citation Prediction Challenge
> Authors: [Apavou Clément](https://github.com/clementapa) & [Belkada Younes](https://github.com/younesbelkada) & [Zucker Arthur](https://github.com/ArthurZucker)


The kaggle challenge is the following : https://www.kaggle.com/c/altegrad-2021/leaderboard
## :mag_right: IntroductionIn this challenge, we are given a large scientific citation graph, with each node corresponding to a certain article. The dataset consists of 138 499 vertices i.e articles, with their associated abstract and list of authors. The goal is to be able to predict whether two nodes are citing each other, given all this information. In the next sections, we will try to elaborate on the various intuitions behind our approaches, and present the obtained results as well as some possible interpretations for each observations. The provided code corresponds to the code that we have used for the best model (i.e [the right commit](https://github.com/younesbelkada/altegrad_challenge/tree/best-model) ).
## :hammer: Getting started
```
pip3 install requirements.txt
```Then,
```
sh download_data.sh
``````
python3 main.py
```## :round_pushpin: Tips
The best model can be used using the [`best-model`](https://github.com/younesbelkada/altegrad_challenge/tree/best-model) branch, as it does not use this implementation of the code.
This branch is the final code as it allows customization of the various embeddings and corresponds to the latest version of the code.## :mag_right: Results
| Model| loss validation |loss test (private leaderboard) | Run |
|---|---|---|---|
| Best model | 0.07775 | 0.07939 | [](https://wandb.ai/altegrad-gnn-link-prediction/test-altegrad/runs/1cwlegzz?workspace=user-clementapa) |All experiments are available on wandb: \
[](https://wandb.ai/altegrad-gnn-link-prediction/altegrad_challenge?workspace=user-clementapa)\
[](https://wandb.ai/altegrad-gnn-link-prediction/test-altegrad?workspace=user-clementapa)## :diamonds: Best MLP architecture
![]()
## :paperclip: Presentation of our work
[Report](https://github.com/younesbelkada/altegrad_challenge/blob/main/assets/Report.pdf) & [Slides](https://github.com/younesbelkada/altegrad_challenge/blob/main/assets/Slides.pdf)
## :wrench: Some tools used
## Some citations
```bibtex
@misc{cohan2020specter,
title={SPECTER: Document-level Representation Learning using Citation-informed Transformers},
author={Arman Cohan and Sergey Feldman and Iz Beltagy and Doug Downey and Daniel S. Weld},
year={2020},
eprint={2004.07180},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```