Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mumtaz4118/scraping-medium-and-data-analytics

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py. To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py
https://github.com/mumtaz4118/scraping-medium-and-data-analytics

data data-analysis data-analytics data-extraction data-preprocessing data-science data-scraping deep-learning machine-learning python

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/mumtaz4118/scraping-medium-and-data-analytics
Owner: mumtaz4118
Created: 2023-02-08T00:14:35.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-02-08T00:23:20.000Z (about 2 years ago)
Last Synced: 2024-11-07T16:31:10.398Z (3 months ago)
Topics: data, data-analysis, data-analytics, data-extraction, data-preprocessing, data-science, data-scraping, deep-learning, machine-learning, python
Language: Jupyter Notebook
Homepage:
Size: 1 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Medium Web Scrapper

To run this, user has to install scrapy library using
pip install scrapy

There are two scrappers
1. medium_scrapper_post.py
This scrapper searches Medium for articles based on a user inputted search string.

To run the scrapper, use

scrapy runspider -a searchString=searchTerm medium_scrapper_post.py

2. medium_scrapper_tag_archive.py
This scraper get all Articles for a particular tag slug in a given date range

Note : If tag is Data Science, then pass tag as 'data-science' in tagSlug Parameter
To run the scrapper, use

scrapy runspider -a tagSlug='tagSlug' -a start_date=YYYYmmdd -a end_date=YYYYmmdd medium_scrapper_tag_archive.py

# Medium Posts Data Extraction

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py.
To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py
# Scraping-Medium-and-Data-Analytics