https://github.com/kernel-loophole/scrapy

Web Scraping and crawling with scrappy
https://github.com/kernel-loophole/scrapy

beautifulsoup4 scrapy webscraping

Last synced: 4 months ago
JSON representation

Web Scraping and crawling with scrappy

Host: GitHub
URL: https://github.com/kernel-loophole/scrapy
Owner: kernel-loophole
Created: 2022-02-19T21:10:08.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2023-07-12T05:11:36.000Z (almost 3 years ago)
Last Synced: 2025-10-18T21:56:25.039Z (8 months ago)
Topics: beautifulsoup4, scrapy, webscraping
Language: HTML
Homepage:
Size: 1.16 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: news scrapping/second/__pycache__/__init__.cpython-38.pyc

Awesome Lists containing this project

README

          # scrapy

Web Scraping and crawling with scrapy

# scraping wikipedia title pages

![deom_scraping_gif](https://user-images.githubusercontent.com/76658396/168424063-6069a8e9-5638-4bd2-9913-a1a2c63df714.gif)

# simple news scarpper

```python

class SecondSpider(Spider):

    name = 'second'

    start_urls = ['https://www.thenews.com.pk/']

    def make(response):

        x=response.xpath('//div[@id="content_left"]/div[@class="result c-container "]/h3/a/text()').extract()

        yield x

    def parse(self, response):

        logging.getLogger('scrapy').propagate = False

        m = '.heading-cat'

        counter=0

        for test in response.css(m):

            Name_SELECTOR = 'h2::text'

            counter+=1

            x=yield {

                counter: test.css(Name_SELECTOR).extract_first(),

            }

            print(cs('Head Line #====',"red"),counter)

            print(cs(test.css(Name_SELECTOR).extract_first(),"yellow"),)

        print(cs("Total","red"),counter)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kernel-loophole/scrapy

Awesome Lists containing this project

README