https://github.com/26hzhang/dblpparser

A python parser for DBLP dataset
https://github.com/26hzhang/dblpparser

dblp-dataset python3

Last synced: 8 months ago
JSON representation

A python parser for DBLP dataset

Host: GitHub
URL: https://github.com/26hzhang/dblpparser
Owner: 26hzhang
License: mit
Created: 2018-04-25T12:36:38.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2019-03-20T06:48:02.000Z (over 7 years ago)
Last Synced: 2025-03-29T09:21:48.555Z (about 1 year ago)
Topics: dblp-dataset, python3
Language: Python
Size: 543 KB
Stars: 45
Watchers: 1
Forks: 17
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # DBLP Dataset Parser

![Authour](https://img.shields.io/badge/Author-Zhang%20Hao%20(Isaac%20Changhau)-blue.svg) ![Python](https://img.shields.io/badge/Python-3.6.5-brightgreen.svg)

It is a python parser for [DBLP dataset](https://dblp.uni-trier.de/), the XML format dumped file can be downloaded [here](http://dblp.org/xml/) from [DBLP Homepage](https://dblp.org/).

This parser requires `dtd` file, so make sure you have both `dblp-XXX.xml` (dataset) and `dblp-XXX.dtd` files. Note that you also should guarantee that both `xml` and `dtd` files are in the same directory, and the name of `dtd` file shoud same as the name given in the `` tag of the `xml` file. Such information can be easily accessed through `head dblp-XXX.xml` command. As shown below

```xml

Carmen Heine

Modell zur Produktion von Online-Hilfen.

...

```

A sample to use the parser:

```python

def main():

    dblp_path = 'dataset/dblp.xml'

    save_path = 'article.json'

    try:

        context_iter(dblp_path)

        log_msg("LOG: Successfully loaded \"{}\".".format(dblp_path))

    except IOError:

        log_msg("ERROR: Failed to load file \"{}\". Please check your XML and DTD files.".format(dblp_path))

        exit()

    parse_article(dblp_path, save_path, save_to_csv=False)  # default save as json format

```

Some extracted results:

**Count the number of all different type of publications**:

![general](/img/general.png)

**Count the number of all different attributes among all publications**:

![all_feature](/img/all_feature.png)

**Count the number of five different features of articles**:

![article_feature](/img/article_feature.png)

**Distribution of published year of articles**:

![article_year](/img/article_year.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/26hzhang/dblpparser

Awesome Lists containing this project

README