https://github.com/liao961120/ptt-terms
Scrapy project for PTT 鄉民百科
https://github.com/liao961120/ptt-terms
ptt python3 scrapy
Last synced: 4 months ago
JSON representation
Scrapy project for PTT 鄉民百科
- Host: GitHub
- URL: https://github.com/liao961120/ptt-terms
- Owner: liao961120
- Created: 2018-09-04T05:40:28.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2021-05-26T09:23:43.000Z (over 4 years ago)
- Last Synced: 2025-03-11T05:32:13.249Z (7 months ago)
- Topics: ptt, python3, scrapy
- Language: Julia
- Homepage: https://yongfu.name/ptt-terms/
- Size: 25.5 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://travis-ci.org/liao961120/ptt-terms)
[](https://www.python.org/)
[](https://docs.scrapy.org/)## Modification
To modify the behavior of the spider,
edit the files marked with `#` in the directory tree below.Directory structure of `PTTdict/`:
```
.
├── run.sh # scrapy crawl parameters
├── view.json # Auto-generated (for viewing)
├── scrapy.cfg
├── setup.py
│
├── PTTdict
│ ├── __init__.py
│ ├── items.py # Define item fields
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── postprocess
│ │ ├── __pycache__/
│ │ └── tidyup.py # Process items before output
│ ├── __pycache__/
│ ├── settings.py # Setting for item piplines
│ └── spiders
│ ├── dict.py # Spider for scraping PTT wiki
│ ├── __init__.py
│ └── __pycache__/
└── data
├── dict_constr.R # Filter & convert to data frame
├── index.Rmd # Build Web Site
├── _site.yml
└── style.css
```