https://github.com/neizod/duppub
Detects duplicate publications
https://github.com/neizod/duppub
csv hacktoberfest python string-matching string-similarity
Last synced: 2 months ago
JSON representation
Detects duplicate publications
- Host: GitHub
- URL: https://github.com/neizod/duppub
- Owner: neizod
- License: mit
- Created: 2019-05-25T08:18:29.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2022-06-21T23:08:56.000Z (almost 4 years ago)
- Last Synced: 2025-04-06T12:34:22.765Z (12 months ago)
- Topics: csv, hacktoberfest, python, string-matching, string-similarity
- Language: Python
- Homepage:
- Size: 15.6 KB
- Stars: 0
- Watchers: 2
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
DupPub (Duplicate Publication)
==============================
Detect duplicate or similar publications from database. This project aim to reduce size of the database by showing pairs of suspect duplications, to help citation easier and cleaner.
Usage
-----
Export database as CSV file without header, with these fields:
1. ID
2. Authors
3. Title of the article
4. Year
5. Abstract
For example, if your exported CSV named `publications.csv`, then run it with:
python3 report.py publications.csv
Example Result
--------------
From `example_input.csv`, this is the result:
| score | id-1 | id-2 |
|---------|------------------------|------------------------|
| 100.00% | cross-publisher-2 | cross-publisher-3 |
| 100.00% | cross-publisher-1 | cross-publisher-3 |
| 100.00% | cross-publisher-1 | cross-publisher-2 |
| 100.00% | arXiv-v3 | arXiv-v4 |
| 100.00% | arXiv-v1 | arXiv-v2 |
| 80.00% | arXiv-v2 | arXiv-v4 |
| 80.00% | arXiv-v2 | arXiv-v3 |
| 80.00% | arXiv-v1 | arXiv-v4 |
| 80.00% | arXiv-v1 | arXiv-v3 |