Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/techjacker/sitemapgenerator
Creates an XML sitemap of a domain
https://github.com/techjacker/sitemapgenerator
Last synced: 15 days ago
JSON representation
Creates an XML sitemap of a domain
- Host: GitHub
- URL: https://github.com/techjacker/sitemapgenerator
- Owner: techjacker
- License: mit
- Created: 2016-06-27T21:22:22.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2021-04-20T17:09:51.000Z (over 3 years ago)
- Last Synced: 2024-10-04T09:07:26.877Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 24.4 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sitemapgenerator
Creates an XML sitemap of a domain.
Python3+.
## Install
```
pip install sitemapgenerator
```## Usage
```Shell
usage: sitemapgenerator [-h] [-f FILE] [-t THROTTLE] [-l LIMIT] [-q] domainGenerate an XML sitemap for a domain
positional arguments:
domain domain to crawloptional arguments:
-h, --help show this help message and exit
-f FILE, --file FILE write the xml to a file
-t THROTTLE, --throttle THROTTLE
max time in secs to wait between requesting URLs
-l LIMIT, --limit LIMIT
max number of URLs to crawl
-q, --quiet
```## Example Usage
```Shell
$ sitemapgenerator -f site.xml -l 1 devopsreactions.tumblr.comcrawling homepage
crawling /post/146054449345/ops-report-three-out-of-five-app-servers#notes
crawled 2 URLs
wrote sitemap to /tmp/site.xml
```-----------------------------------------------------------
## Development
### Setup
#### Set up virtualenv
```
pyenv install 3.5.0
pyenv local 3.5.0
pyvenv env
source env/bin/activate
```#### Install requirements
```
pip install -r requirements.txt
```#### Update requirements
```
pip install -r requirements-to-freeze.txt --upgrade
pip freeze > requirements.txt
```-----------------------------------------------------------
## Tests
```
py.test tests -q
```#### Anaconda Settings
Add the following to your project settings
```
"settings":
{
"test_virtualenv": "~/path_to_project/env",
"test_command": "py.test"
}
```-----------------------------------------------------------
## TODO
- add more tests
- normalize URLs to remove dupes
- hashes from end of URLs (eg /some/url/#respond)
- tailing slashes on URLs
- add option to create sitemap of:
- external URLs
- non HTML URLs on same domain
- refactor code
- create separate data class which crawler inherits from/accesses
- create single getter method for ```Crawler``` class links and remove extra get_* methods
- add concurrency (eventlet/gevent)
- add progress bar to CLI
- add support for Python 2
- add tox tests for different python versions