https://github.com/abhishekshah5486/aganitha_assignment

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/abhishekshah5486/aganitha_assignment
Owner: abhishekshah5486
Created: 2025-02-07T19:47:30.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-02-08T18:14:30.000Z (9 months ago)
Last Synced: 2025-04-04T07:33:47.095Z (7 months ago)
Language: Jupyter Notebook
Size: 15.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# aganitha_python_assignment

# PubMed Research Paper Fetcher

This Python program fetches research papers from PubMed based on a user-specified query, identifies papers with at least one author affiliated with a pharmaceutical or biotech company, and returns the results in a CSV file format.

## Table of Contents

- [Overview](#overview)
- [Installation](#installation)
- [Usage](#usage)
- [Command-line Options](#command-line-options)
- [Code Organization](#code-organization)
- [External Tools and Libraries](#external-tools-and-libraries)
- [How to Contribute](#how-to-contribute)
- [License](#license)

## Overview

The program interacts with PubMed's API to fetch research papers based on the provided search query. It then filters papers to identify those with at least one non-academic author or one author affiliated with a pharmaceutical or biotech company. The results are saved in a CSV file, which can be specified by the user.

### Features

- Fetches research papers from PubMed using its full query syntax.
- Filters papers based on non-academic authors or authors affiliated with pharmaceutical/biotech companies.
- Returns results in CSV format with the following columns:
- `PubmedID`: Unique identifier for the paper.
- `Title`: Title of the paper.
- `Publication Date`: Date the paper was published.
- `Non-academic Author(s)`: Names of authors affiliated with non-academic institutions.
- `Company Affiliation(s)`: Names of pharmaceutical/biotech companies.
- `Corresponding Author Email(s)`: Corresponding author's email address.

## Installation

To install the required dependencies and set up the project, follow these steps:

1. **Clone the repository:**

```bash
git clone https://github.com/your-username/pubmed-fetcher.git
cd pubmed-fetcher

2. **Install Poetry**

If you don't have Poetry installed, you can follow the installation guide [here](https://python-poetry.org/).

To install Poetry, run:

```bash
curl -sSL [https://install.python-poetry.org](https://install.python-poetry.org) | python3 -
````

3. **Install dependencies using Poetry**

Inside the project directory, run:

```bash
poetry install
```

This will set up all the required dependencies specified in `pyproject.toml`.

## Usage

### Command-line Options

The program supports the following command-line options:

* `-h` or `--help`: Display usage instructions.
* `-d` or `--debug`: Enable debug mode, printing additional information about the execution process.
* `-f` or `--file`: Specify the filename where the results should be saved. If this option is not provided, the output will be printed to the console.

### Example Usage

1. Fetch papers based on a query and print the results to the console:

```bash
poetry run get-papers-list "machine learning"
```

2. Fetch papers and save the results to a CSV file:

```bash
poetry run get-papers-list "machine learning" -f results.csv
```

3. Enable debug mode to print additional information:

```bash
poetry run get-papers-list "machine learning" -d
```

### Example Output

When the program runs, it will either save the results to a CSV file or print the following structure to the console:

```
Retrieved Paper Details['PubMed ID', 'Title', 'Publication Date', 'Non-academic Author(s)', 'Company Affiliation(s)', 'Corresponding Author Email(s)']
['12345678', 'Machine Learning in Healthcare', '2025-02-01', 'Dr. John Doe', 'XYZ Biotech', 'john.doe@xyz.com']
...
```

## Code Organization

The code is organized into the following components:

1. **`PubMedFetcher` Class:** Responsible for fetching PubMed IDs based on a search query and retrieving the corresponding paper details in batches.
2. **`PubMedParser` Class:** Responsible for parsing the retrieved XML data, extracting relevant paper metadata (such as authors, publication date, and company affiliations).
3. **`CSVWriter` Class:** Responsible for saving the parsed paper metadata into a CSV file.
4. **Main Script:** The script that handles command-line arguments and ties everything together.

## External Tools and Libraries

* **Requests:** Used for making HTTP requests to the PubMed API. [Requests Documentation](https://www.google.com/url?sa=E&source=gmail&q=https://docs.python-requests.org/en/latest/)
* **ElementTree (`xml.etree.ElementTree`):** Used for parsing XML data returned by the PubMed API.
* **Poetry:** Used for dependency management and packaging. [Poetry Documentation](https://python-poetry.org/)
* **CSV Module:** Used to write the results to a CSV file.

## How to Contribute

Contributions to this project are welcome\! To contribute:

1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes.
4. Open a pull request with a detailed description of your changes.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

```
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abhishekshah5486/aganitha_assignment

Awesome Lists containing this project

README