https://github.com/kernel-loophole/scrapy
Web Scraping and crawling with scrappy
https://github.com/kernel-loophole/scrapy
beautifulsoup4 scrapy webscraping
Last synced: 4 months ago
JSON representation
Web Scraping and crawling with scrappy
- Host: GitHub
- URL: https://github.com/kernel-loophole/scrapy
- Owner: kernel-loophole
- Created: 2022-02-19T21:10:08.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-07-12T05:11:36.000Z (almost 3 years ago)
- Last Synced: 2025-10-18T21:56:25.039Z (8 months ago)
- Topics: beautifulsoup4, scrapy, webscraping
- Language: HTML
- Homepage:
- Size: 1.16 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: news scrapping/second/__pycache__/__init__.cpython-38.pyc
Awesome Lists containing this project
README
# scrapy
Web Scraping and crawling with scrapy
# scraping wikipedia title pages

# simple news scarpper
```python
class SecondSpider(Spider):
name = 'second'
start_urls = ['https://www.thenews.com.pk/']
def make(response):
x=response.xpath('//div[@id="content_left"]/div[@class="result c-container "]/h3/a/text()').extract()
yield x
def parse(self, response):
logging.getLogger('scrapy').propagate = False
m = '.heading-cat'
counter=0
for test in response.css(m):
Name_SELECTOR = 'h2::text'
counter+=1
x=yield {
counter: test.css(Name_SELECTOR).extract_first(),
}
print(cs('Head Line #====',"red"),counter)
print(cs(test.css(Name_SELECTOR).extract_first(),"yellow"),)
print(cs("Total","red"),counter)
```