Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gsarti/paperpile-notion
Integrating Paperpile with Notion Databases 🔄
https://github.com/gsarti/paperpile-notion
academia automation notion notion-api paperpile papers
Last synced: 3 months ago
JSON representation
Integrating Paperpile with Notion Databases 🔄
- Host: GitHub
- URL: https://github.com/gsarti/paperpile-notion
- Owner: gsarti
- License: apache-2.0
- Created: 2021-08-17T11:03:23.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-03-03T09:11:27.000Z (11 months ago)
- Last Synced: 2024-10-24T04:53:20.260Z (3 months ago)
- Topics: academia, automation, notion, notion-api, paperpile, papers
- Language: TeX
- Homepage: https://papers.gsarti.com
- Size: 11.4 MB
- Stars: 50
- Watchers: 3
- Forks: 10
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Paperpile Notion Integration
This repository provides a simple integration between [Paperpile](https://paperpile.com/) and [Notion](www.notion.so) using the new [Notion API](https://developers.notion.com/). The purpose is to make it easy to periodically sync a list of papers in Paperpile to a Notion database.
This is a work in progress, and is currently intended for personal use only (no support, no warranty, no liability, etc.).
**New**: The WIP script `download_paperpile_folder.py` provides an automated way to download a folder from a Paperpile account. This script uses Chromium and Selenium, so the chrome drivers must be placed under the path to make it work. Check the args for more information.
## Installation
Simply clone the repo locally and install the dependencies, preferably in a virtualenv:
```shell
git clone https://github.com/gsarti/paperpile-notion.git
cd paperpile-notion
python3 -m venv venv
pip install --upgrade pip
pip install -r requirements.txt
```## Requirements
To run the script, you will need the following things:
1. A CSV file exported from Paperpile containing the list of papers and their metadata. [data.csv](data.csv) is an example of an exported CSV. For now, this needs to be manually downloaded and moved to this folder since Paperpile does not provide any API for exporting data.
2. A configuration file to map categories, journals and conferences to their acronyms. [config.yaml](config.yaml) is an example of a configuration file containing major AI and NLP conferences and journals.
3. A database id for the Notion database you want to sync to. To retrieve the database id, follow the directions provided [here](https://developers.notion.com/docs/working-with-databases). The current structure for the database must contain at least the following columns:
- `Item type` ( `select` ): Corresponds to the `Item type` field in the Paperpile export (e.g. `Conference Paper`, `Journal Article`, etc.).
- `Title` ( `title` ): The title of the paper.
- `Status` ( `select` ): Set to `Done` when the paper was read, empty otherwise. Can take other values. Managed by using a "Read" and a "To Read" folder inside Papepile.
- `Authors` ( `multi_select` ): The paper's authors. Corresponds to the `Authors` field in the Paperpile export, with only lastnames and first letter of firstnames.
- `Venues` ( `multi_select` ): The venues in which the paper was published. Based on the config sections for mapping names to acronyms. Multiselect to specify e.g. conference + arXiv.
- `Date` ( `date` ): The date the paper was published.
- `Link` ( `url` ): Link to the paper. If multiple links are available, arXiv links are preferred.
- `Categories` ( `multi_select` ): The categories the paper belongs to. Define the macro-fields to which the paper belongs. These are extracted from the labels that were assigned to the paper on Paperpile.
- `Methods` ( `multi_select` ): The methods and aspects investigated in the paper. Can be whatever, from architectures (e.g. CNN, Transformer) to sub-topics. On Paperpile, these correspond to labels having the following format: `category_shortname - method_name` (e.g. Probing tasks for interpretability research could be `INT - Probing`). Refer to the CSV file for an example.
4. A Notion API key. To retrieve the API key, follow the directions provided in the [Notion API Getting Started](https://developers.notion.com/docs/getting-started). You will also need to add permission for the integration on the database from the previous point.
## Usage
Once everything is in place, simply run the script as:
```shell
python update_notion_db.py \
--input data.csv \
--config config.yaml \
--database \
--token
```The experimental script to auto-download a folder from Paperpile can be run as:
```shell
python download_paperpile_dir.py \
--username \
--password \
--folder_id # e.g. pp-folder-2cb1833f-582f-0000-ad59-567be5718692
```This will download the folder content in CSV format in the default download location.
Example output, adding a new paper to the database:
![Console output](img/output.png)
Example resulting database on Notion:
![Notion result](img/notion_result.png)