https://github.com/sphinxknight/spider_l10n
Crawler to check if pages in a site have been translated or not
https://github.com/sphinxknight/spider_l10n
Last synced: about 2 months ago
JSON representation
Crawler to check if pages in a site have been translated or not
- Host: GitHub
- URL: https://github.com/sphinxknight/spider_l10n
- Owner: SphinxKnight
- Created: 2013-04-25T18:35:55.000Z (about 13 years ago)
- Default Branch: master
- Last Pushed: 2014-03-08T13:42:29.000Z (about 12 years ago)
- Last Synced: 2026-01-01T02:34:55.187Z (5 months ago)
- Language: Python
- Size: 217 KB
- Stars: 0
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
spider_l10n
===========
This crawler can be used to check if pages of a site have been
translated or not.
It does not use any external libraries and works with Python 3.
The syntax is the following :
`python script_crawler.py url_of_site target_language crawl_delay`
(please follow the directives of robots.txt)
OR if a previous crawl ran over more than 100 pages
`python script_crawler.py resume`
Users can be interested in customizing the crawler by modifying (a
minima) the following methods of the function_crawler.py file :
- filteredLink (for the links that should not be used)
- hasBeenTranslated (to design the relevant test)