Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/c-bata/pysearch
Web crawler and Search engine in Python.
https://github.com/c-bata/pysearch
Last synced: 1 day ago
JSON representation
Web crawler and Search engine in Python.
- Host: GitHub
- URL: https://github.com/c-bata/pysearch
- Owner: c-bata
- Archived: true
- Created: 2014-11-10T03:56:14.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2016-05-23T13:54:53.000Z (over 8 years ago)
- Last Synced: 2024-08-02T13:28:11.982Z (3 months ago)
- Language: Python
- Homepage: http://nwpct1.hatenablog.com/entry/python-search-engine
- Size: 19.5 KB
- Stars: 54
- Watchers: 7
- Forks: 19
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Search Engine and Web Crawler in Python
![Screenshot](https://qiita-image-store.s3.amazonaws.com/0/29989/786c36ad-4de7-43a7-75a0-98c82e412fa3.png "Screenshot")
- Implement a web crawler
- japanese morphological analysis using [janome](https://github.com/mocobeta/janome)
- Implement search engine
- Store in MongoDB
- Web frontend using [Flask](http://flask.pocoo.org/)More details are avairable from [My Tech Blog(Japanese)](http://nwpct1.hatenablog.com/entry/python-search-engine).
## Requirements
- Python 3.5
## Setup
1. Clone repository
```
$ git clone [email protected]:mejiro/SearchEngine.git
```
2. Install python packages```
$ cd SearchEngine
$ pip install -r requirements.txt -c constraints.txt
```3. MongoDB settings
4. Run```
$ python manage.py crawler # build a index
$ python manage.py webpage # access to http://127.0.0.1:5000
```