{"id":21036013,"url":"https://github.com/archiveteam/archivebot","last_synced_at":"2025-04-09T05:10:50.722Z","repository":{"id":10463312,"uuid":"12636383","full_name":"ArchiveTeam/ArchiveBot","owner":"ArchiveTeam","description":"ArchiveBot, an IRC bot for archiving websites","archived":false,"fork":false,"pushed_at":"2024-09-23T04:18:10.000Z","size":2866,"stargazers_count":357,"open_issues_count":172,"forks_count":71,"subscribers_count":27,"default_branch":"master","last_synced_at":"2024-10-30T00:55:21.716Z","etag":null,"topics":["archiving","haxe","irc","javascript","python","ruby"],"latest_commit_sha":null,"homepage":"http://www.archiveteam.org/index.php?title=ArchiveBot","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ArchiveTeam.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-09-06T05:13:57.000Z","updated_at":"2024-10-15T14:19:21.000Z","dependencies_parsed_at":"2024-11-06T10:42:14.473Z","dependency_job_id":"adebf731-36ba-4405-bac8-bd2d5de05979","html_url":"https://github.com/ArchiveTeam/ArchiveBot","commit_stats":{"total_commits":1947,"total_committers":37,"mean_commits":52.62162162162162,"dds":0.6271186440677966,"last_synced_commit":"203d40a01bfe7dc9d79dab107106931a97cb37d3"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2FArchiveBot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2FArchiveBot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2FArchiveBot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2FArchiveBot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ArchiveTeam","download_url":"https://codeload.github.com/ArchiveTeam/ArchiveBot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247980837,"owners_count":21027808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archiving","haxe","irc","javascript","python","ruby"],"created_at":"2024-11-19T13:17:27.611Z","updated_at":"2025-04-09T05:10:50.697Z","avatar_url":"https://github.com/ArchiveTeam.png","language":"Python","readme":"1. ArchiveBot\n\n    \u003cSketchCow\u003e Coders, I have a question.\n    \u003cSketchCow\u003e Or, a request, etc.\n    \u003cSketchCow\u003e I spent some time with xmc discussing something we could\n                do to make things easier around here.\n    \u003cSketchCow\u003e What we came up with is a trigger for a bot, which can\n                be triggered by people with ops.\n    \u003cSketchCow\u003e You tell it a website. It crawls it. WARC. Uploads it to\n                archive.org. Boom.\n    \u003cSketchCow\u003e I can supply machine as needed.\n    \u003cSketchCow\u003e Obviously there's some sanitation issues, and it is root\n                all the way down or nothing.\n    \u003cSketchCow\u003e I think that would help a lot for smaller sites\n    \u003cSketchCow\u003e Sites where it's 100 pages or 1000 pages even, pretty\n                simple.\n    \u003cSketchCow\u003e And just being able to go \"bot, get a sanity dump\"\n\n2. More info\n\nArchiveBot has two major backend components: the control node, which\nruns the IRC interface and bookkeeping programs, and the crawlers, which\ndo all the Web crawling.  ArchiveBot users communicate with ArchiveBot\nby issuing commands in an IRC channel.\n\nUser's guide: http://archivebot.readthedocs.org/en/latest/\nControl node installation guide: INSTALL.backend\nCrawler installation guide: INSTALL.pipeline\n\n3. Local use\n\nArchiveBot was originally written as a set of separate programs for\ndeployment on a server.  This means it has a poor distribution story.\nHowever, Ivan Kozik (@ivan) has taken the ArchiveBot pipeline,\ndashboard, ignores, and control system and created a package intended for\npersonal use.  You can find it at https://github.com/ArchiveTeam/grab-site.\n\n4. License\n\nCopyright 2013 David Yip; made available under the MIT license.  See\nLICENSE for details.\n\n5. Acknowledgments\n\nThanks to Alard (@alard), who added WARC generation and Lua scripting to\nGNU Wget.  Wget+lua was the first web crawler used by ArchiveBot.\n\nThanks to Christopher Foo (@chfoo) for wpull, ArchiveBot's current web\ncrawler.\n\nThanks to Ivan Kozik (@ivan) for maintaining ignore patterns and\ntracking down performance problems at scale.\n\nOther thanks go to the following projects:\n\n* Celluloid \u003chttp://celluloid.io/\u003e\n* Cinch \u003chttps://github.com/cinchrb/cinch/\u003e\n* CouchDB \u003chttp://couchdb.apache.org/\u003e\n* Ember.js \u003chttp://emberjs.com/\u003e\n* Redis \u003chttp://redis.io/\u003e\n* Seesaw \u003chttps://github.com/ArchiveTeam/seesaw-kit\u003e\n\n6. Special thanks\n\nDragonette, Barnaby Bright, Vienna Teng, NONONO.\n\nThe memory hole of the Web has gone too far.\nDon't look down, never look away; ArchiveBot's like the wind.\n\n vim:ts=2:sw=2:tw=72:et\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchiveteam%2Farchivebot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farchiveteam%2Farchivebot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchiveteam%2Farchivebot/lists"}