Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cmoussa1/cs242-project

a repository to hold the contents for our group CS 242 project
https://github.com/cmoussa1/cs242-project

Last synced: 24 days ago
JSON representation

a repository to hold the contents for our group CS 242 project

Awesome Lists containing this project

README

        

# Scrapy Notes

## installation requirements/prerequisites

First, check to see that you have Python (>= 3.6) and Scrapy installed on your
system. To install Scrapy:

```
pip3 install scrapy
```

## instructions on how to set up a Scrapy project

Inside of the directory of your choice, run:

```console
scrapy startproject my_project_name
```

There will be a number of folders and files created!!

* `scrapy.cfg`: the project configuration file.
* `projectname/`: this directory contains your project’s Python modules.
* `projectname/items.py`: define the data structure for scraped data here.
* `projectname/pipelines.py`: process the scraped data (e.g., cleaning, storing to a database).
* `projectname/settings.py`: configure settings like user agent, concurrent requests, etc.
* `projectname/spiders/`: this directory will contain your spiders.

Inside the `spiders/` director, you can create a spider file using the
`scrapy genspider` command:

```console
scrapy genspider my_new_spider www.baseball-reference.com
```