https://github.com/cedergrouphub/limesoup
LimeSoup is a package to parse HTML or XML papers from different publishers.
https://github.com/cedergrouphub/limesoup
html journal-article nlp parser python xml
Last synced: 13 days ago
JSON representation
LimeSoup is a package to parse HTML or XML papers from different publishers.
- Host: GitHub
- URL: https://github.com/cedergrouphub/limesoup
- Owner: CederGroupHub
- License: mit
- Created: 2018-02-18T09:24:31.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2021-01-04T14:54:20.000Z (about 5 years ago)
- Last Synced: 2025-09-09T23:38:55.143Z (5 months ago)
- Topics: html, journal-article, nlp, parser, python, xml
- Language: Python
- Homepage:
- Size: 15.4 MB
- Stars: 20
- Watchers: 6
- Forks: 7
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- License: LICENSE
Awesome Lists containing this project
README
# LimeSoup
LimeSoup is a package to parse HTML or XML papers from different publishers. It can be
used to feed a database.
[](https://semaphoreci.com/cedergrouphub/limesoup)
# Usage
Full Usage:
```
from LimeSoup import (
ACSSoup,
AIPSoup,
APSSoup,
ECSSoup,
ElsevierSoup,
IOPSoup,
NatureSoup,
RSCSoup,
SpringerSoup,
WileySoup,
)
with open(article, 'r', encoding = 'utf-8') as f:
html_str = f.read()
***Choose correct publisher
data = ECSSoup.parse(html_str)
with open('file_test.json', 'w', encoding = 'utf-8') as f:
json.dump(data, f, sort_keys=True, indent=4, ensure_ascii=False)
```
Currently, we have implemented the following parsers:
- [ECS: The Electrochemical Society](http://ecsdl.org)
- [RSC: The Royal Society of Chemistry](https://www.rsc.org)
- [Elsevier](https://www.elsevier.com/catalog)
- [Nature Publishing Group](https://www.nature.com)
- [Springer](https://www.springernature.com/gp/products/journals)
- [Wiley](https://onlinelibrary.wiley.com)
- [ACS: American Chemical Society](https://pubs.acs.org)
- [APS: American Physical Society](https://journals.aps.org)
- [IOP Publishing](https://ioppublishing.org/publications/our-journals/)
- [AIP: American Institute of Physics](https://aip.scitation.org/)
# Development documentation
Please refer to the [wiki pages](https://github.com/CederGroupHub/LimeSoup/wiki).
# Change logs
Please see [change logs](CHANGES.md).
# Credits
LimeSoup was contributed to by these genius people:
- Tiago Botari
- Ziqin Rong
- Vahe Tshitoyan
- Nicolas Mingione
- Jason Madeano
- Haoyan Huo
- Tanjin He
- Zach Jensen
- Alex van Grootel
- Edward Kim
- Haihao Liu
- Zheren Wang
If you are planning to use LimeSoup in your work, please consider citing the following paper:
* Kononova et. al "Text-mined dataset of inorganic materials synthesis recipes", Scientific Data 6 (1), 1-11 (2019) [10.1038/s41597-019-0224-1](https://www.nature.com/articles/s41597-019-0224-1)