Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kenjyco/parse-helper
Helpers to fetch & parse text on pages with requests, lxml, & beautifulsoup4
https://github.com/kenjyco/parse-helper
beautifulsoup cli duckduckgo kenjyco lxml parse python requests
Last synced: about 1 month ago
JSON representation
Helpers to fetch & parse text on pages with requests, lxml, & beautifulsoup4
- Host: GitHub
- URL: https://github.com/kenjyco/parse-helper
- Owner: kenjyco
- License: other
- Created: 2017-02-24T01:12:40.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2022-08-08T11:00:32.000Z (over 2 years ago)
- Last Synced: 2024-04-23T00:43:53.317Z (8 months ago)
- Topics: beautifulsoup, cli, duckduckgo, kenjyco, lxml, parse, python, requests
- Language: Python
- Homepage:
- Size: 47.9 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
## Install
Install system requirements for `lxml`
```
% sudo apt-get install -y libxml2 libxslt1.1 libxml2-dev libxslt1-dev zlib1g-devor
% brew install libxml2
```Install with `pip`
```
% pip3 install parse-helper
```> Optionally install ipython with `pip3 install ipython` to enable
> `ph-soup-explore` command## Usage
The `ph-ddg`, `ph-download-files`, `ph-download-file-as`, and
`ph-soup-explore` scripts are provided```
$ venv/bin/ph-ddg --help
Usage: ph-ddg [OPTIONS] [QUERY]Pass a search query to duckduckgo api
Options:
--help Show this message and exit.$ venv/bin/ph-download-files --help
Usage: ph-download-files [OPTIONS] [ARGS]...Download all links to local files
- args: urls or filenames containing urls
Options:
--help Show this message and exit.$ venv/bin/ph-download-file-as --help
Usage: ph-download-file-as [OPTIONS] URL [LOCALFILE]Download link to local file
- url: a string - localfile: a string
Options:
--help Show this message and exit.$ venv/bin/ph-soup-explore --help
Usage: ph-soup-explore [OPTIONS] [URL_OR_FILE]Create a soup object from a url or file and explore with ipython
Options:
--help Show this message and exit.
``````python
In [1]: import parse_helper as phIn [2]: ph.USER_AGENT
Out[2]: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/58.0.3029.110 Chrome/58.0.3029.110 Safari/537.36'In [3]: ph.duckduckgo_api('adventure time')
2019-08-27 06:21:05,303: Fetching JSON from https://api.duckduckgo.com?q=adventure+time&format=json
Out[3]:
[{'text': 'Adventure Time An American animated television series created by Pendleton Ward for Cartoon Network.',
'thumbnail': 'https://duckduckgo.com/i/fb8f17fd.png',
'link': 'https://duckduckgo.com/Adventure_Time'},
{'text': '"Adventure Time" (pilot) An animated short created by Pendleton Ward, as well as the pilot to the Cartoon Network series...',
'thumbnail': 'https://duckduckgo.com/i/aa9b49e0.png',
'link': 'https://duckduckgo.com/Adventure_Time_(pilot)'},
{'text': "Adventure Time (1959 TV series) A local children's television show on WTAE-TV 4 in Pittsburgh, Pennsylvania, from 1959 to 1975.",
'thumbnail': '',
'link': 'https://duckduckgo.com/Adventure_Time_(1959_TV_series)'},
{'text': "Adventure Time (1967 TV series) A Canadian children's adventure television series which aired on CBC Television in 1967 and 1968.",
'thumbnail': '',
'link': 'https://duckduckgo.com/Adventure_Time_(1967_TV_series)'},
{'text': 'Adventure Time (album) The second album for the rock/pop trio The Elvis Brothers.',
'thumbnail': '',
'link': 'https://duckduckgo.com/Adventure_Time_(album)'}]
```