{"id":24149150,"url":"https://github.com/hkattt/gopher-web-crawler","last_synced_at":"2025-10-05T22:20:00.981Z","repository":{"id":243635322,"uuid":"785088646","full_name":"hkattt/gopher-web-crawler","owner":"hkattt","description":"Rust Gopher web crawler","archived":false,"fork":false,"pushed_at":"2024-06-10T07:56:52.000Z","size":272,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-12T08:36:28.336Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hkattt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-11T07:08:23.000Z","updated_at":"2024-06-10T08:00:57.000Z","dependencies_parsed_at":"2024-06-10T11:04:32.677Z","dependency_job_id":null,"html_url":"https://github.com/hkattt/gopher-web-crawler","commit_stats":null,"previous_names":["hkattt/gopher-web-crawler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hkattt%2Fgopher-web-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hkattt%2Fgopher-web-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hkattt%2Fgopher-web-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hkattt%2Fgopher-web-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hkattt","download_url":"https://codeload.github.com/hkattt/gopher-web-crawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241398156,"owners_count":19956684,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-12T08:35:59.490Z","updated_at":"2025-10-05T22:19:55.930Z","avatar_url":"https://github.com/hkattt.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gopher Web Crawler\n\nGopher web crawler implemented in Rust for COMP3310 Computer Networks at the ANU.\n\n## Requirements\nSee the [Installation section](https://doc.rust-lang.org/book/ch01-01-installation.html) of The Rust Programming Language book for installation steps. \n\nThe project uses the following crates:\n* `chrono`: For date-time functionality.\n* `debug_print`: For print functions which only trigger in debug mode.\n\nAll networking functionality was done using standard library imports.\n\nThe Gopher crawler has been successfully tested on Linux and Windows.\n\n## Usage\n\nThe usage for the program is:\n```\ngopher [-n \u003cserver_name\u003e] [-p \u003cserver_port\u003e] [-d]\n```\nWhere\n* `-n` specifies the name of the server to crawl\n* `-p` specifies the port of the server to crawl \n* `-d` flags that the output directory `out` should **not** be deleted\n\nwith default values `server_name=comp3310.ddns.net` and `server_port=70`.\n\nTo run the program in debug mode use\n```\ncargo run -- [-n \u003cserver_name\u003e] [-p \u003cserver_port\u003e] [-d]\n```\nin the root directory. In debug mode, the program will print additional information and error messages. To run the program in release mode use \n```\ncargo run --release -- [-n \u003cserver_name\u003e] [-p \u003cserver_port\u003e] [-d] \n```\nThis will only print request information and the final crawl report. \n\n## Project Structure\n```\n├── Cargo.lock\n├── Cargo.toml\n├── imgs\n│   └── wireshark-convo.png\n├── README.md\n└── src\n    ├── crawler.rs\n    ├── gopher\n    │   ├── request.rs\n    │   └── response.rs\n    ├── gopher.rs\n    └── main.rs\n```\n\n## External Servers\nAn external server is any referenced server that is on a different host or port to the default server. `comp3310.ddns.net:70` references two external servers. Further details can be found in the crawler report. \n\n## Invalid References\nFiles only contribute to the file count and file statistics if the Gopher transaction was completed successfully. Responses that we timed-out are not deemed successful transactions. \n\nAs RFC 1436, text file and directory item types should be terminated with the last line `'.'CR-LF`. If the last line is missing, the transaction is not counted as successful.\n\nThe crawler identified 5 problematic internal references which had to be dealt with explicitly. The full details can be found in the crawler report.\n\n## Crawler Report \nThis final crawler report for `comp3310.ddns.net:70` is shown below. \n```\nSTART CRAWLER REPORT\n\n\tNumber of Gopher directories: 41\n\t\tcomp3310.ddns.net:70: \n\t\tcomp3310.ddns.net:70: /acme\n\t\tcomp3310.ddns.net:70: /acme/products\n\t\tcomp3310.ddns.net:70: /acme/products/traps\n\t\tcomp3310.ddns.net:70: /maze/17\n\t\tcomp3310.ddns.net:70: /maze/18\n\t\tcomp3310.ddns.net:70: /maze/19\n\t\tcomp3310.ddns.net:70: /maze/20\n\t\tcomp3310.ddns.net:70: /maze/21\n\t\tcomp3310.ddns.net:70: /maze/22\n\t\tcomp3310.ddns.net:70: /maze/23\n\t\tcomp3310.ddns.net:70: /misc\n\t\tcomp3310.ddns.net:70: /misc/empty\n\t\tcomp3310.ddns.net:70: /misc/malformed1\n\t\tcomp3310.ddns.net:70: /misc/more\n\t\tcomp3310.ddns.net:70: /misc/nesta\n\t\tcomp3310.ddns.net:70: /misc/nestb\n\t\tcomp3310.ddns.net:70: /misc/nestc\n\t\tcomp3310.ddns.net:70: /misc/nestd\n\t\tcomp3310.ddns.net:70: /misc/neste\n\t\tcomp3310.ddns.net:70: /misc/nestf\n\t\tcomp3310.ddns.net:70: /misc/nestg\n\t\tcomp3310.ddns.net:70: /misc/nesth\n\t\tcomp3310.ddns.net:70: /misc/nesti\n\t\tcomp3310.ddns.net:70: /misc/nestj\n\t\tcomp3310.ddns.net:70: /misc/nestk\n\t\tcomp3310.ddns.net:70: /misc/nestl\n\t\tcomp3310.ddns.net:70: /misc/nestm\n\t\tcomp3310.ddns.net:70: /misc/nestn\n\t\tcomp3310.ddns.net:70: /misc/nesto\n\t\tcomp3310.ddns.net:70: /misc/nestp\n\t\tcomp3310.ddns.net:70: /misc/nestq\n\t\tcomp3310.ddns.net:70: /misc/nestr\n\t\tcomp3310.ddns.net:70: /misc/nests\n\t\tcomp3310.ddns.net:70: /misc/nestt\n\t\tcomp3310.ddns.net:70: /misc/nestu\n\t\tcomp3310.ddns.net:70: /misc/nestv\n\t\tcomp3310.ddns.net:70: /misc/nestw\n\t\tcomp3310.ddns.net:70: /misc/nestx\n\t\tcomp3310.ddns.net:70: /misc/nesty\n\t\tcomp3310.ddns.net:70: /misc/nonexistent\n\n\tNumber of simple text files: 11\n\t\tcomp3310.ddns.net:70: /acme/about\n\t\tcomp3310.ddns.net:70: /acme/contact\n\t\tcomp3310.ddns.net:70: /acme/products/anvils\n\t\tcomp3310.ddns.net:70: /acme/products/paint\n\t\tcomp3310.ddns.net:70: /acme/products/pianos\n\t\tcomp3310.ddns.net:70: /maze/floppy\n\t\tcomp3310.ddns.net:70: /maze/statuette\n\t\tcomp3310.ddns.net:70: /misc/empty.txt\n\t\tcomp3310.ddns.net:70: /misc/loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong\n\t\tcomp3310.ddns.net:70: /misc/nestz\n\t\tcomp3310.ddns.net:70: /rfc1436.txt\n\n\tNumber of binary files: 2\n\t\tcomp3310.ddns.net:70: /misc/binary\n\t\tcomp3310.ddns.net:70: /misc/encabulator.jpeg\n\n\tSmallest text file: comp3310.ddns.net:70: /misc/empty.txt\n\t\tSize: 0 bytes\n\t\tContents: \n\n\tSize of the largest text file: 37393 bytes\n\t\tcomp3310.ddns.net:70: /rfc1436.txt\n\n\tSize of the smallest binary file: 253 bytes\n\t\tcomp3310.ddns.net:70: /misc/binary\n\n\tSize of the largest binary file: 45584 bytes\n\t\tcomp3310.ddns.net:70: /misc/encabulator.jpeg\n\n\tThe number of unique invalid references (error types): 2\n\n\tList of external servers:\n\t\tcomp3310.ddns.net:71 did not connect\n\t\tgopher.floodgap.com:70 connected successfully\n\n\tReferences that have issues/errors:\n\t\tConnection timed out comp3310.ddns.net:70 /misc/godot\n\t\tConnection timed out comp3310.ddns.net:70 /misc/tarpit\n\t\tFile too long comp3310.ddns.net:70 /misc/firehose\n\t\tMalformed response line 1Some menu - but on what host???\t/misc/malformed1/file\t\n\t\tMissing end-line comp3310.ddns.net:70 /misc/malformed2\n\nEND CRAWLER REPORT\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkattt%2Fgopher-web-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhkattt%2Fgopher-web-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkattt%2Fgopher-web-crawler/lists"}