https://github.com/ejw-data/web-scraping-proteins

Webscrape of Pubmed publication data that is used in a single webpage with multiple plotly charts. The basic structure of the website is updated with an excel spreadsheet to help those who don't know how to code.
https://github.com/ejw-data/web-scraping-proteins

beautifulsoup excel html-css-javascript pandas plotly python splinter

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/ejw-data/web-scraping-proteins
Owner: ejw-data
Created: 2022-07-22T23:35:08.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2022-07-26T06:39:24.000Z (almost 3 years ago)
Last Synced: 2025-01-22T06:47:17.406Z (6 months ago)
Topics: beautifulsoup, excel, html-css-javascript, pandas, plotly, python, splinter
Language: Jupyter Notebook
Homepage:
Size: 1.91 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# web-scraping-proteins

Author: Erin James Wills, [email protected]

![NIH web scrape banner](./static/images/protein-webscrapte.png)
Photo by National Cancer Institute on Unsplash

## Overview

> The content of this repo generated the following webpage: http://nrtdp.northwestern.edu/targets/ (as of July 2022)

## Technologies
* Python
* Pandas
* Splinter
* BeautifulSoup
* Plotly
* HTML/CSS/JS

## Data Source

The dataset is generated by scraping the Pubmed search results based on a protein name:
* [Pubmed Search "p21"](https://pubmed.ncbi.nlm.nih.gov/?term=p21)

## Setup and Installation
1. Environment needs the following:
* Python 3.6+
* pandas
* webdriver_manager.chrome
* splinter
* BeautifulSoup
* time
* json
1. Activate your environment
1. Clone the repo to your local machine
1. Start Jupyter Notebook within the environment from the repo
1. To run and/or troubleshoot the scraping, run `pubmed_scrape.ipynb`.
1. To view the index page, I suggest that you use a VSCode Extension called "LiveServer" to view the `index.html` file.

## Images

![](./webscrape/images/uniprotkb.jpg)

![](./webscrape/images/pubmed_protein_search.jpg)

![](./webscrape/images/js-table.jpg)

![](./webscrape/images/js-table-filtered.jpg)

![](./webscrape/images/js-plotly-target-selected.jpg)

![](./webscrape/images/js-plotly-menu-closed.jpg)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ejw-data/web-scraping-proteins

Awesome Lists containing this project

README