{"id":15646520,"url":"https://github.com/manfred/ensure-encoding","last_synced_at":"2025-04-30T12:29:00.921Z","repository":{"id":4371948,"uuid":"443935","full_name":"Manfred/Ensure-encoding","owner":"Manfred","description":"Experimental project to find the best way to ensure a preferred encoding in Strings coming from untrusted sources.","archived":false,"fork":false,"pushed_at":"2022-01-28T08:06:50.000Z","size":33,"stargazers_count":47,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-30T12:28:48.147Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Manfred.png","metadata":{"files":{"readme":"README.rdoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2009-12-20T18:39:21.000Z","updated_at":"2023-12-20T13:42:21.000Z","dependencies_parsed_at":"2022-08-06T16:15:25.994Z","dependency_job_id":null,"html_url":"https://github.com/Manfred/Ensure-encoding","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Manfred%2FEnsure-encoding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Manfred%2FEnsure-encoding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Manfred%2FEnsure-encoding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Manfred%2FEnsure-encoding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Manfred","download_url":"https://codeload.github.com/Manfred/Ensure-encoding/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251701694,"owners_count":21629882,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T12:13:10.448Z","updated_at":"2025-04-30T12:29:00.884Z","avatar_url":"https://github.com/Manfred.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"= Ensure Encoding\n\nExperimental project to find the best way to ensure a preferred encoding in\nStrings coming from untrusted sources.\n\n== Algorithms\n\nThe most sane way of dealing with character data is choosing an internal\nrepresentation for your application and convert all incoming and outgoing data\nwhen necessary.\n\nTo ensure that our internal encoding is always at least valid we can choose to\ndo one of two things.\n\n1. Throw an exception when we receive invalid character data. (ie. Our internal\n   encoding is UTF-8 and we receive Latin-1 data or invalid UTF-8)\n2. Accept whatever we get and try to mold it in such a way that it becomes\n   usable in our application.\n\nIt is generally accepted to go for the second option, most of the times the\nend-user has no way of solving these problems because a vendor made a mistake.\nIt's not very nice to shut them out.\n\nThere are a number of techniques when molding the character data to our needs.\nSniffing encoding, transcoding, dropping invalid characters are just a few\nexamples. We implement a number of these techniques so you can easily protect\nyour application from bad data.\n\n== Ensure encoding\n\nWe've crammed at lot of functionality into the ensure_encoding method because\nwe want to keep the number of new methods on String to a minimum. We'll walk\nthrough an example to show how it works.\n\n  example = 'Café'\n  example.encoding =\u003e #\u003cEncoding:ISO-8859-1\u003e\n  \nAfter ensuring the encoding of a string you can at least assume that you can\nconcatenate the string to another string with the same encoding. In other\nwords, it contains data valid for the specified encoding.\n\n  example.ensure_encoding('UTF-8')\n  example.encoding =\u003e #\u003cEncoding:UTF-8\u003e\n\nBeyond this you can specify a number of options to perform more operations\nto make sure the data in the string didn't become unreadable garbage. Let's\nlook at a number of situations.\n\n=== Untrusted source with known encoding\n\nFor instance, when you're excepting data from browsers you can be pretty sure\nthe character data is properly encoded. When someone does send bad data it's\nprobably a hacker and you can discard the request.\n\n  example.ensure_encoding('UTF-8',\n    :external_encoding  =\u003e 'UTF-8,\n    :invalid_characters =\u003e :raise\n  )\n\n=== Friendly source with known encoding\n\nIn this scenario you're connecting to web API through a ReST library and you\nknow the encoding of the source data because it's in the headers. However\nyou're not sure the encoding of the strings is valid.\n\n  example.ensure_encoding(Encoding::UTF_8\n    :external_encoding  =\u003e Encoding::UTF_8,\n    :invalid_characters =\u003e :drop\n  )\n\n=== Untrusted source with variable encoding\n\nAssume we have a legacy database and some of the fields contain Shift JIS,\nwhile some of the newer fields contain UTF-8 because someone screwed up the\nserver configuration. You're not even sure the encoding property on the\nstrings you got make any sense because your database adapter is confused too.\n\n  example.ensure_encoding('UTF-8',\n    :external_encoding  =\u003e [Encoding::Shift_JIS, Encoding::UTF_8],\n    :invalid_characters =\u003e :transcode\n  )\n\n=== Untrusted source with unknown encoding\n\nAs a last resort you're trying to read some random files from disk and you\nhave no idea what the external encoding is. You've just read them as binary\nand are hoping to make some sense from the data.\n\n  example.ensure_encoding('UTF-8',\n    :external_encoding  =\u003e :sniff,\n    :invalid_characters =\u003e :transcode\n  )\n\nNote that the encoding sniffer is currently very naive and might not always be\nof any help.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanfred%2Fensure-encoding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmanfred%2Fensure-encoding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanfred%2Fensure-encoding/lists"}