{"id":20512169,"url":"https://github.com/kinson/wikimini","last_synced_at":"2026-06-06T19:31:06.275Z","repository":{"id":84509350,"uuid":"56536075","full_name":"kinson/WikiMini","owner":"kinson","description":"A project for my Algorithms course with the intent of refining the way we use wikipedia and compress the the pages to minimize the data transferred during requests","archived":false,"fork":false,"pushed_at":"2016-04-19T00:12:44.000Z","size":65,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-16T09:08:50.725Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kinson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-04-18T19:31:45.000Z","updated_at":"2016-04-18T19:34:43.000Z","dependencies_parsed_at":"2023-03-02T20:45:45.943Z","dependency_job_id":null,"html_url":"https://github.com/kinson/WikiMini","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kinson%2FWikiMini","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kinson%2FWikiMini/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kinson%2FWikiMini/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kinson%2FWikiMini/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kinson","download_url":"https://codeload.github.com/kinson/WikiMini/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242117655,"owners_count":20074435,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T20:40:14.720Z","updated_at":"2026-06-06T19:31:06.263Z","avatar_url":"https://github.com/kinson.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"#WikiMini\n##About\nWikiMini is a project I began for my Algorithms course with the intent of refining the way we use wikipedia and compress the the pages to minimize the data transferred across requests. Wikipedia pages contain lots of information about any particular subject including many extraneous and unnecessary data, all of which is sent the users device whenever they want to know something about that particular object. Most use cases involve looking for a small piece of information within the sea of encylcopeic knowledge about a particular subject. In this case, the user only needs less than 10% of the content on the page.\n\nThe solution to this problem is two fold, the first (1) is refining the process by which you get information from wikipedia and the second (2) is more technical and involves reducing the actual data transmission size using huffman coding compression.\n\n##(1) Refining the Lookup Process\n\nIn order to refine the process of looking up information on wikipedia, I split it into three steps. First the user searches for the relevant page on wikipedia which will contain the information they're looking for. Second, the user gets a list of sections from whichever existing wikipedia page they selected during the previous step. Third, the user selects a particular section from the wikipedia page and recieves a stripped version of the text. This is where the compression using huffman coding comes into play, minimizing data transfer sizes.\n\n###The API\nTo implement this new lookup process I created an API with an endpoint for each step.\n\n#### POST /api/search requires ['termstring' : string]\n\n#### POST /api/getsections requires ['pagename' : string]\n\n#### POST /api/getsection requires ['pagename' : string, 'sectionindex' : string]\n\n\n##(2) Reducing the size of the request\n\nIn order to reduce the the amount of data sent to the client, specifically at step three in the aforementioned process, I used Huffman coding compression on the text retrieved from wikipedia. This requires building a Huffman encoding scheme (or tree structure) for each section processed and then decoding it on the client using a similar process to retrieve the original string. The compression returns a string of binary data which is transmitted from server to client. However, without actually encoding it with something other than US ASCII (the standard for HTTP 1.1), it is only simulating the possibility of reducing the data size, not actually minimizing it. In order to see tangible results, the 1s and 0s must be sent as bits rather than ASCII symbols which are about a byte each in size.\n\nBecause creating a custom encoding is unreasonable for the scope of this project, I decided to try to fit as much data into each ASCII symbol as possible. Each ASCII symbol has a numerical representation (i.e. A is decimal 65) which can be translated into bits, so 1000001 would is the equivalent of A. Thus, by splitting up the Huffman encoded strings into 7 bit chunks, each chunk can be aliased as a letter from the US ASCII set. This effectively encodes the binary string while still using the standard US ASCII encoding.\n\nThe compressed data, along with a dictionary associating the bit patterns to each character is sent to the client, effectively minimizing the amount of data needed per request.\n\n## Future Works\nIf I continued working on this project I would implement local caching on devices so clients could relook up the same data without having to use more data. I would also make the Huffman coding compression two way, minimizing data going to and from the client instead of just to the client from the server.\n\n###Resources\n\n####wikipedia api sources\nhttps://en.wikipedia.org/w/api.php\nhttps://en.wikipedia.org/w/api.php?action=help\u0026modules=query\nhttps://en.wikipedia.org/w/api.php?action=help\u0026modules=parse\nhttp://stackoverflow.com/questions/13517901/how-to-use-mediawikis-api-to-get-the-first-section-of-an-article\n\n\n####cheerio\nhttps://github.com/cheeriojs/cheerio\n\n\n####general js sources\nhttps://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort#Description\nhttps://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/splice\nhttps://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/log\n\n\n####general ios source\nhttp://stackoverflow.com/questions/25879837/how-to-display-html-formatted-text-in-ios-label\nhttp://stackoverflow.com/questions/24092884/get-nth-character-of-a-string-in-swift-programming-language\nhttp://stackoverflow.com/questions/25921204/convert-swift-string-to-array\nhttp://stackoverflow.com/questions/24102044/how-can-i-get-the-unicode-code-points-of-a-character\nhttp://stackoverflow.com/questions/26181221/how-to-convert-a-decimal-number-to-binary-in-swift\n\n\n\n####research links\nhttp://stackoverflow.com/questions/818122/which-encoding-is-used-by-the-http-protocol\nhttp://stackoverflow.com/questions/19212306/whats-the-difference-between-ascii-and-unicode\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkinson%2Fwikimini","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkinson%2Fwikimini","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkinson%2Fwikimini/lists"}