https://github.com/sergioburdisso/solidscraper

Easy to use JQuery-Like API for Web Scraping/Crawling.
https://github.com/sergioburdisso/solidscraper

crawler crawling crawling-python jquery python scraper scraping tweets twitter web web-crawler web-scraping webscraping

Last synced: 6 months ago
JSON representation

Easy to use JQuery-Like API for Web Scraping/Crawling.

Host: GitHub
URL: https://github.com/sergioburdisso/solidscraper
Owner: sergioburdisso
License: mit
Created: 2017-10-10T12:11:24.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-07-02T12:45:03.000Z (almost 5 years ago)
Last Synced: 2024-11-11T14:19:26.109Z (6 months ago)
Topics: crawler, crawling, crawling-python, jquery, python, scraper, scraping, tweets, twitter, web, web-crawler, web-scraping, webscraping
Language: Python
Homepage:
Size: 31.3 KB
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        
# Solid Scraper

Easy to use JQuery-Like API for Web Scraping/Crawling. It also supports Cookies and custom User Agents. Solidscraper is compatible with **Python 2 and 3**.

---

## 1. Installation

````

pip install solidscraper

````

**Note:** if you already have installed it, and wanted the latest version, then use the following command to update `solidscraper`:

````

pip install --upgrade solidscraper

````

---

## 2. "Hello World" Examples

Getting all url of all links:

````python

import solidscraper as ss

doc = ss.load("https://www.example.com/the/path")

# print the list of urls from all  elements

print(doc.select("a").getAttribute("href"))

````


Getting all url of all links inside \s whose class id is 'links':

````python

import solidscraper as ss

doc = ss.load("https://www.example.com/the/path")

# print the list of urls from all  elements inside 


print(doc.select("div #links").then("a").getAttribute("href"))

````

Getting the text of all \ elements inside \
 whose class are 'info':

````python

import solidscraper as ss

doc = ss.load("https://www.example.com/the/path")

# print the text of all  elements inside 


print(doc.select("p .info").then("span").text())

````

**Note:** these examples use the python 3 print function, in case you want to run them with python 2, either replace the `print()` function with the python 2 `print` statement or add the following import line as the first statement of your code: `from __future__ import print_function`.

---

## 3. "Real World" Examples

The `examples` [folder above](https://github.com/sergioburdisso/solidscraper/tree/master/examples/) contains two fully functional examples: one to download tweets by hashtags and another to download complete users timeline (tweets and images). Both scripts were completely built using `solidscraper`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sergioburdisso/solidscraper

Awesome Lists containing this project

README