Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sergioburdisso/solidscraper

Easy to use JQuery-Like API for Web Scraping/Crawling.
https://github.com/sergioburdisso/solidscraper

crawler crawling crawling-python jquery python scraper scraping tweets twitter web web-crawler web-scraping webscraping

Last synced: about 1 month ago
JSON representation

Easy to use JQuery-Like API for Web Scraping/Crawling.

Awesome Lists containing this project

README

        

# Solid Scraper

Easy to use JQuery-Like API for Web Scraping/Crawling. It also supports Cookies and custom User Agents. Solidscraper is compatible with **Python 2 and 3**.

---
## 1. Installation

````
pip install solidscraper
````

**Note:** if you already have installed it, and wanted the latest version, then use the following command to update `solidscraper`:

````
pip install --upgrade solidscraper
````

---
## 2. "Hello World" Examples

Getting all url of all links:

````python
import solidscraper as ss

doc = ss.load("https://www.example.com/the/path")

# print the list of urls from all elements
print(doc.select("a").getAttribute("href"))
````

Getting all url of all links inside \

s whose class id is 'links':

````python
import solidscraper as ss

doc = ss.load("https://www.example.com/the/path")

# print the list of urls from all elements inside


print(doc.select("div #links").then("a").getAttribute("href"))
````

Getting the text of all \ elements inside \

whose class are 'info':

````python
import solidscraper as ss

doc = ss.load("https://www.example.com/the/path")

# print the text of all elements inside


print(doc.select("p .info").then("span").text())
````

**Note:** these examples use the python 3 print function, in case you want to run them with python 2, either replace the `print()` function with the python 2 `print` statement or add the following import line as the first statement of your code: `from __future__ import print_function`.

---
## 3. "Real World" Examples

The `examples` [folder above](https://github.com/sergioburdisso/solidscraper/tree/master/examples/) contains two fully functional examples: one to download tweets by hashtags and another to download complete users timeline (tweets and images). Both scripts were completely built using `solidscraper`.