https://github.com/vprusso/us_patent_scraper
A scraper for the US patents website (http://www.uspto.gov/)
https://github.com/vprusso/us_patent_scraper
Last synced: 8 months ago
JSON representation
A scraper for the US patents website (http://www.uspto.gov/)
- Host: GitHub
- URL: https://github.com/vprusso/us_patent_scraper
- Owner: vprusso
- License: gpl-2.0
- Created: 2015-05-10T22:36:24.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2015-05-11T14:34:38.000Z (about 11 years ago)
- Last Synced: 2025-03-31T17:55:11.056Z (about 1 year ago)
- Language: Python
- Size: 160 KB
- Stars: 8
- Watchers: 3
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# US Patent Scraper
A scraper for the US patents website (http://www.uspto.gov/)
This script requires the open source Python scraping library, Scrapy:
http://scrapy.org/
## Goal:
To obtain the following list of information from a user-specified search of US patents:
1. Patent Number
2. US Patent Class Number
3. International Patent Class Number
4. Inventor Country Code
5. Document Identifier
6. Abstract
7. Patent File Date
8. Patent Granted Date
9. Inventor Names
10. Patent Name
The URLs are read from "us_patent_urls.txt" and scrapes each search result. The results of each patent are then collected into a .json file.
## Instructions:
Simply run the main.py file in patent_spider/patent_spider/main.py