Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/qasimkhan5x/semantic-scholar-property-graph-neo4j
Neo4J Property Graph Model of Semantic Scholar Papers Dataset
https://github.com/qasimkhan5x/semantic-scholar-property-graph-neo4j
neo4j property-graph semantic-scholar
Last synced: 21 days ago
JSON representation
Neo4J Property Graph Model of Semantic Scholar Papers Dataset
- Host: GitHub
- URL: https://github.com/qasimkhan5x/semantic-scholar-property-graph-neo4j
- Owner: QasimKhan5x
- License: mit
- Created: 2024-03-28T20:08:47.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-04-08T21:16:06.000Z (9 months ago)
- Last Synced: 2024-11-03T08:42:01.375Z (2 months ago)
- Topics: neo4j, property-graph, semantic-scholar
- Language: Jupyter Notebook
- Homepage:
- Size: 1.17 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Neo4J Property Graph Model of Semantic Scholar Papers Dataset
This README outlines the steps needed to set up and run the project environment, including installing necessary libraries, processing data, and loading it into a Neo4j database.
## Environment Setup
Ensure you have Python 3.9 installed on your system. You can check your Python version by running:
```bash
python --version
```### Install Libraries
First, install the required libraries listed in `requirements.txt` by running the following command:
```bash
pip install -r requirements.txt
```## Data Preparation
### Download Dataset
Download the dataset from the provided hyperlink. Ensure the downloaded dataset is saved in the project's root directory.
[Download Dataset](https://drive.google.com/drive/folders/19AC2DGWyPLi7hWFlZ3iDD1OgRLlNzxE9?usp=sharing)
### Data Transformation
Run the `notebooks/etl.ipynb` Jupyter notebook to create necessary files from the dataset. Initially, `all_data.csv` will be used.
#### Skipping Keyword Creation
The keyword creation part of the ETL process is time-consuming. If you prefer to skip this step, directly use `all_data_with_keywords.csv` in the notebook instead of `all_data.csv`.
## Neo4j Database Setup
After processing the data, set up a Neo4j DBMS and obtain the path to its DBMS directory. On macOS, the path looks like this:
```
/Users/user/Library/Application%20Support/Neo4j%20Desktop/Application/relate-data/dbmss//import/csv_path.txt
```Replace `` with the appropriate folder name for your DBMS.
### Moving CSV Files
Move all the CSV files generated by `etl.ipynb` to the directory path you obtained in the previous step.
## Running the Data Pipeline
Execute the data pipeline with the following bash script command. Ensure to replace `--config` with your configuration file path, if necessary. Also, make the necessary modifications of the username, password, and database in the config file.
```bash
bash run_loader.sh --config config.ini
```## Results
Below are the sample results we obtained from running the pipeline:
![Part A Results](images/partA.png)
![Part B Results](images/partB.png)
![Part C Results](images/partC.png)
![Part D Results 1](images/partD-1.png)
![Part D Results 2](images/partD-2.png)