Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dotcomboom/Gophew

Gopher crawler and search engine
https://github.com/dotcomboom/Gophew

gopher gopher-crawler gopher-server pituophis

Last synced: about 2 months ago
JSON representation

Gopher crawler and search engine

Host: GitHub
URL: https://github.com/dotcomboom/Gophew
Owner: dotcomboom
License: unlicense
Created: 2019-03-03T20:45:42.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2023-02-14T23:50:32.000Z (almost 2 years ago)
Last Synced: 2024-08-02T05:11:56.347Z (5 months ago)
Topics: gopher, gopher-crawler, gopher-server, pituophis
Language: Python
Size: 25.4 KB
Stars: 3
Watchers: 3
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Gophew
Gopher crawler and search-enabled server powered by Pituophis.

## crawler.py
This script creates the searchable database and outputs it as JSON. Edit settings inside the script.

```python
###########################

settings = {
'limit_host': 'your.live.host', # Host to limit to (for indexing single servers, which is highly recommended)
'only_record_host': True,
'path_must_start_with': '/', # What the path/selector must start with
'db_filename': 'db.json', # Filename to use for the database
'delay': 2, # x second delay between grabbing files; please be courteous to servers you don't own!
'crawl_url': 'gopher://your.live.host/1/', # URL to crawl (after finished updating the index)
'cooldown': 86400, # Required cooldown in ms before crawling a URL again
'ignore_types': ['i', '3'] # Types of items that should be ignored and not recorded
}

###########################
```

## gophew.py
The frontend Gopher server, that uses [Pituophis](https://github.com/dotcomboom/Pituophis) with an alternate handler.

```python
###########################

settings = {
# Pituophis server options
'host': 'your.live.host',
'port': 70,
'pub_dir': 'pub/',

# Gophew
'index': 'db.json', # Index to use (generated by crawler.py)
'alternate_titles': True, # Whether to display alternate titles
'referrers': True, # Whether to display referring URLs
'search_path': '/search', # What the path must start with in order to do a search (a file shouldn't exist here for the alt handler to go off)
'typestrings': True, # Allow filtering searches by type, i.e. /search01 for textfiles and directories.
'root_path': '/', # Path to link to on the results page
'allow_empty_queries': False, # Whether to allow empty search queries

# Below lines can be disabled by setting them to None
'root_text': 'Back to root',
'new_search_text': 'Try another search',
'new_search_text_same_filter': 'Try another search with the same criteria',
'results_caption': 'Results for {} (out of {} items)',
'types_caption': 'Filtering types: {}',
'empty_queries_not_allowed_msg': 'Empty search queries are not allowed on this server.'
}

###########################
```