Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scrapinghub/portia
Visual scraping for Scrapy
https://github.com/scrapinghub/portia
Last synced: 5 days ago
JSON representation
Visual scraping for Scrapy
- Host: GitHub
- URL: https://github.com/scrapinghub/portia
- Owner: scrapinghub
- License: bsd-3-clause
- Created: 2014-03-21T14:24:31.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2023-09-06T14:08:43.000Z (about 1 year ago)
- Last Synced: 2024-04-14T13:12:26.092Z (7 months ago)
- Language: Python
- Size: 24.3 MB
- Stars: 9,160
- Watchers: 503
- Forks: 1,408
- Open Issues: 126
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES
- License: LICENSE
Awesome Lists containing this project
- my-awesome-starred - portia - Visual scraping for Scrapy (JavaScript)
- my-awesome-github-stars - scrapinghub/portia - Visual scraping for Scrapy (Python)
- awesome-python-resources - GitHub - 24% open · ⏱️ 10.07.2019): (HTML 处理)
- awesome-scrapy - Portia
- starred-awesome - portia - Visual scraping for Scrapy (Python)
README
Portia
======Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web page to identify the data you wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.
# Running Portia
The easiest way to run Portia is using [Docker]:
You can run Portia using Docker & official Portia-image by running:
docker run -v ~/portia_projects:/app/data/projects:rw -p 9001:9001 scrapinghub/portia
You can also set up a local instance with [Docker-compose] by cloning this repo & running from the root of the folder:
docker-compose up
For more detailed instructions, and alternatives to using Docker, see the [Installation] docs.
# Documentation
Documentation can be found from [Read the docs]. Source files can be found in the ``docs`` directory.
[Docker]: https://www.docker.com/
[Docker-compose]:https://docs.docker.com/compose
[Installation]: http://portia.readthedocs.org/en/latest/installation.html
[Read the docs]: http://portia.readthedocs.org/en/latest/index.html
[Scrapinghub]: https://portia.scrapinghub.com/