https://github.com/scrapinghub/portia
Visual scraping for Scrapy
https://github.com/scrapinghub/portia
Last synced: 6 months ago
JSON representation
Visual scraping for Scrapy
- Host: GitHub
- URL: https://github.com/scrapinghub/portia
- Owner: scrapinghub
- License: bsd-3-clause
- Created: 2014-03-21T14:24:31.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2024-06-26T19:43:46.000Z (over 1 year ago)
- Last Synced: 2024-10-29T10:55:05.325Z (about 1 year ago)
- Language: Python
- Size: 24.4 MB
- Stars: 9,296
- Watchers: 504
- Forks: 1,408
- Open Issues: 130
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES
- License: LICENSE
Awesome Lists containing this project
- my-awesome-starred - portia - Visual scraping for Scrapy (JavaScript)
- awesome-crawler-cn - portia - 基于Scrapy的可视化数据采集框架. (Python)
- awesome-python - portia - Visual scraping for Scrapy. (Web Crawling)
- awesome-scrapy - Portia
- awesome-python - portia - Visual scraping for Scrapy. (Web Crawling)
- starred-awesome - portia - Visual scraping for Scrapy (Python)
- fucking_awesome_python - portia - Visual scraping for Scrapy. (Web Crawling)
- my-awesome-github-stars - scrapinghub/portia - Visual scraping for Scrapy (Python)
- python-awesome - portia - Visual scraping for Scrapy. (Web Crawling)
- awesome-python-resources - GitHub - 24% open · ⏱️ 10.07.2019): (HTML 处理)
- awesome-python - portia - Visual scraping for Scrapy. (Web Crawling)
- awesome-python - portia - Visual scraping for Scrapy ` 📝 2 years ago ` (Web Crawling [🔝](#readme))
- awesome-python - portia - Visual scraping for Scrapy. (Web Crawling & Web Scraping)
- fucking-awesome-python - :octocat: portia - :star: 8934 :fork_and_knife: 1415 - Visual scraping for Scrapy. (Web Crawling)
- awesome-python - portia - Visual scraping for Scrapy. (Web Crawling & Web Scraping)
- awesome-python-cn - portia
- awesome-crawler - portia - Visual scraping for Scrapy. (Python)
- Awesome-Python - portia - Visual scraping for Scrapy. (Web Crawling & Web Scraping)
README
Portia
======
Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web page to identify the data you wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.
# Running Portia
The easiest way to run Portia is using [Docker]:
You can run Portia using Docker & official Portia-image by running:
docker run -v ~/portia_projects:/app/data/projects:rw -p 9001:9001 scrapinghub/portia
You can also set up a local instance with [Docker-compose] by cloning this repo & running from the root of the folder:
docker-compose up
For more detailed instructions, and alternatives to using Docker, see the [Installation] docs.
# Documentation
Documentation can be found from [Read the docs]. Source files can be found in the ``docs`` directory.
[Docker]: https://www.docker.com/
[Docker-compose]:https://docs.docker.com/compose
[Installation]: http://portia.readthedocs.org/en/latest/installation.html
[Read the docs]: http://portia.readthedocs.org/en/latest/index.html
[Scrapinghub]: https://portia.scrapinghub.com/