{"id":21035646,"url":"https://github.com/archiveteam/terroroftinytown","last_synced_at":"2025-10-28T12:31:00.366Z","repository":{"id":14879949,"uuid":"17603513","full_name":"ArchiveTeam/terroroftinytown","owner":"ArchiveTeam","description":"URLTeam's second generation of URL shortener archiving tools","archived":false,"fork":false,"pushed_at":"2024-08-14T18:19:09.000Z","size":885,"stargazers_count":71,"open_issues_count":17,"forks_count":15,"subscribers_count":19,"default_branch":"develop","last_synced_at":"2024-10-30T00:55:53.473Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://urlte.am","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ArchiveTeam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-03-10T18:23:56.000Z","updated_at":"2024-08-20T10:57:47.000Z","dependencies_parsed_at":"2022-08-07T08:00:42.578Z","dependency_job_id":"7d20ecbc-910d-409c-a139-711f80982035","html_url":"https://github.com/ArchiveTeam/terroroftinytown","commit_stats":{"total_commits":437,"total_committers":13,"mean_commits":33.61538461538461,"dds":0.1967963386727689,"last_synced_commit":"1ca86f9acaab1a899b1aaf38275163393fb4deef"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fterroroftinytown","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fterroroftinytown/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fterroroftinytown/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fterroroftinytown/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ArchiveTeam","download_url":"https://codeload.github.com/ArchiveTeam/terroroftinytown/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238651159,"owners_count":19507730,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T13:15:42.188Z","updated_at":"2025-10-28T12:30:59.952Z","avatar_url":"https://github.com/ArchiveTeam.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"terroroftinytown\n================\n\n[URLTeam](http://urlte.am)'s second generation of URL shortener archiving tools.\n\nFor details, please see the [wiki page](http://archiveteam.org/index.php?title=URLTeam).\n\n\nRunning\n=======\n\nTracker\n-------\n\nYou will need Python 3 and Redis.\n\nHow to run the tracker:\n\n        pip3 install -r requirements-tracker.txt\n        python3 -m terroroftinytown.tracker THE_CONFIG_FILE.conf\n\nUse `--debug` when developing. Use `--xheaders` when running behind a web server reverse proxy.\n\n\nExport\n-------\n\n        python3 -m terroroftinytown.tracker.export THE_CONFIG_FILE.conf output_dir\n\nThe output directory will be created if it does not exists. Specify `--format urlteam` to export in old URLTeam format (no BEACON headers). You will need GNU Sort installed.\n\nAn automatic script, to be run from cron, that drains the results, compress, and upload to Internet Archive:\n\n        python3 -m terroroftinytown.release.supervisor config.conf \\\n        EXPORT_WORKING_DIRECTORY/ --verbose --batch-size 5000000\n\n\nTest\n----\n\n![CI Tests](https://github.com/archiveteam/terroroftinytown/actions/workflows/test.yml/badge.svg)\n\nTo run the tests including testing the web interface,\n\n1. Install Firefox 48+\n2. Install Selenium for Python from PyPI\n3. Download geckodriver and put it located on `PATH` environment variable\n4. Run test runner nose\n\nFor example, tests:\n\n        apt-get install firefox\n        pip3 install selenium\n        wget https://github.com/mozilla/geckodriver/releases/download/v0.11.1/geckodriver-v0.11.1-OS_VERSION_HERE.tar.gz\n        nosetests3\n\nAlternative examples:\n\n        NO_LIVE_SERVICE_TEST=1 NO_SELENIUM_TEST=1 python -m unittest discover terroroftinytown -p '*test.py'\n        NO_LIVE_SERVICE_TEST=1 RUN_CHROMEDRIVER=1 python -m unittest discover terroroftinytown -p '*test.py'\n\nClient\n------\n\nThe client should work in Python 3.9+. Please be mindful when writing the client code.\n\nSee [terroroftinytown-client-grab](https://github.com/ArchiveTeam/terroroftinytown-client-grab) for details on how to run the scraper as part of the Warrior project.\n\n\nStructure\n=========\n\nThe project is split into two main components: client and tracker.\n\nThe client component contains the library needed for performing the request to the shortener. It uses generic shortener parameters such as the alphabet and sequence numbers. The client is responsible for converting the sequence numbers into shortcodes and then fetching them.\n\nThe client also contains shortener specific code called services that customize the generic behavior. Custom behavior may be needed to extract the URLs from the HTML itself.\n\nOnce the client has finished scraping, it uploads the shortcode and URLs to the tracker.\n\nThe tracker component manages items and projects. Items represent the shortener tasks while projects represent the shortener parameters. Items contain a range of sequence numbers. Items that are checked out are called claims. The tracker supports automatically generating more items.\n\nThe tracker will attempt to distribute items across projects so the client does not work on more than one shortener per IP address to avoid bans.\n\nThere are two version numbers: Library version for `terroroftinytown.client.__init__` and Pipeline Version for `terroroftinytown-client-grab/pipeline.py`.\n\n\nNotes\n=====\n\nWhen dealing with non-ASCII characters, one cannot simply treat them as UTF-8 since the originating URL may come from other character sets such as shift-jis. As such, it is ideal to handle the URLs in raw bytes as much as possible. Therefore, the files should be treated as bytes. If not possible, use a \"lossless\" encoding suitable for your environment. For Python, latin-1 should be used instead of UTF-8. Avoid percent-encoding as much as possible since some servers do not handle percent-encoding well.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchiveteam%2Fterroroftinytown","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farchiveteam%2Fterroroftinytown","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchiveteam%2Fterroroftinytown/lists"}