https://github.com/sammyjava/bsharp-pubmed
Java package for fetching content from PubMed
https://github.com/sammyjava/bsharp-pubmed
Last synced: 9 months ago
JSON representation
Java package for fetching content from PubMed
- Host: GitHub
- URL: https://github.com/sammyjava/bsharp-pubmed
- Owner: sammyjava
- Created: 2023-04-28T17:30:57.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-13T03:30:33.000Z (almost 3 years ago)
- Last Synced: 2025-01-09T08:53:46.357Z (about 1 year ago)
- Language: Java
- Size: 8.84 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# org.bsharp.pubmed
Java library for searching and reading PubMed XML data.
## xmlbeans
This package uses org.apache.xmlbeans to generate the XML schema from the `eutils.xsd` file.
JAXB has a problem in that it doesn't handle HTML within tags (which is common in PubMed abstracts and even titles).
The xmlbeans tool that is used to parse an XSD is `scomp` as follows:
```
scomp -d bin -src src/main/java -out libs/eutils.jar -dl eutils.xsd
```
Unfortunately, `scomp` does not handle DTD files, which is what NCBI provides. I was able to obtain `eutils.xsd` from
a Github repo.
## JAXB
Conversion of the DTD to XSD has been problematic. So I generated XML schema from other DTDs using the JAXB tool xjc,
and use JAXB methods to parse those cases.
Using both xmlbeans and JAXB is clearly not optimal, but the lack of XSD for the PubMed schema is a real annoyance. But I've
spent a lot of time trying to generate XSD from the DTDs and there always seem to be problems.