Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cmoussa1/cs242-project
a repository to hold the contents for our group CS 242 project
https://github.com/cmoussa1/cs242-project
Last synced: 24 days ago
JSON representation
a repository to hold the contents for our group CS 242 project
- Host: GitHub
- URL: https://github.com/cmoussa1/cs242-project
- Owner: cmoussa1
- Created: 2024-01-16T04:39:49.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-03-09T01:51:01.000Z (11 months ago)
- Last Synced: 2024-11-11T21:46:11.105Z (3 months ago)
- Language: Jupyter Notebook
- Size: 21.5 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scrapy Notes
## installation requirements/prerequisites
First, check to see that you have Python (>= 3.6) and Scrapy installed on your
system. To install Scrapy:```
pip3 install scrapy
```## instructions on how to set up a Scrapy project
Inside of the directory of your choice, run:
```console
scrapy startproject my_project_name
```There will be a number of folders and files created!!
* `scrapy.cfg`: the project configuration file.
* `projectname/`: this directory contains your project’s Python modules.
* `projectname/items.py`: define the data structure for scraped data here.
* `projectname/pipelines.py`: process the scraped data (e.g., cleaning, storing to a database).
* `projectname/settings.py`: configure settings like user agent, concurrent requests, etc.
* `projectname/spiders/`: this directory will contain your spiders.Inside the `spiders/` director, you can create a spider file using the
`scrapy genspider` command:```console
scrapy genspider my_new_spider www.baseball-reference.com
```