{"id":21064746,"url":"https://github.com/zmuhls/ling78100-mp4","last_synced_at":"2025-06-11T02:05:24.358Z","repository":{"id":167761729,"uuid":"232762469","full_name":"zmuhls/LING78100-MP4","owner":"zmuhls","description":"Machine programming assignment on regular expressions","archived":false,"fork":false,"pushed_at":"2020-01-09T08:51:59.000Z","size":106,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-14T01:34:26.179Z","etag":null,"topics":["computational-linguistics","python","regex"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zmuhls.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-09T08:42:45.000Z","updated_at":"2020-01-09T08:54:06.000Z","dependencies_parsed_at":"2023-09-17T10:15:44.768Z","dependency_job_id":null,"html_url":"https://github.com/zmuhls/LING78100-MP4","commit_stats":null,"previous_names":["zmuhls/ling78100-mp4"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zmuhls%2FLING78100-MP4","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zmuhls%2FLING78100-MP4/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zmuhls%2FLING78100-MP4/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zmuhls%2FLING78100-MP4/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zmuhls","download_url":"https://codeload.github.com/zmuhls/LING78100-MP4/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zmuhls%2FLING78100-MP4/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259184739,"owners_count":22818267,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-linguistics","python","regex"],"created_at":"2024-11-19T17:51:32.069Z","updated_at":"2025-06-11T02:05:24.331Z","avatar_url":"https://github.com/zmuhls.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"MP4: Regular expressions\n========================\n\nThis MP consists of two parts, both using Python's `re` module.\n\nMatching\n--------\n\nDownload the attached [`wordlist.txt`](wordlist.txt) file. Each line contains an\nEnglish word. Open the file in Python, and read the words line by line. For\neach word, if the word contains any \"doubled\" (i.e., repeated and adjacent)\nletters, print it out; otherwise, do not print it. For instance, you would\nprint the word *annoy* because it contains the subsequence *nn*, but not\n*suds* because the two _s_ characters are not adjacent.\n\n### What to turn in\n\n-   A code snippet which performs the requested operation\n\n### Hints\n\n-   There is more than one way to solve this.\n-   Don't forget to strip your lines; don't print any blank lines.\n-   You will want to use `re.search` rather than `re.match` (ask yourself why).\n-   You may wish to consult [the documentation for\n    `re.search`](https://docs.python.org/3/library/re.html#re.search).\n-   You do not need to write a class or a function here, though a simple\n    function might be a helpful way to \"encapsulate\" your code.\n\n### Test support\n\nThere are 197 words (or 195 unique words) with doubled letters in the wordlist;\nthe first being *bottler* and the last being *volunteers*. You may want to keep\ntrack of the number of words with doubled letters to test your solution.\n\nSubstitution\n------------\n\nOften when dealing with corpus data, we wish to remove\n[personally identifying information (PII)](https://en.wikipedia.org/wiki/Personal_data).\nUsing the following `corpus` string, write regular expressions that find all\ninstances of complete 7-, 11, and 13-digit phone numbers and rewrite them so that\nevery digit is a 0. For example, _345-6789_ is a 7-digit phone number, and\n_044-113-496-0000_ is a 13-digit number.\n\n```python\ncorpus = \"\"\"In the US, some phone numbers are reserved for fictitious purposes.\nFor instance, 555-0198 and 1-206-5555-0100 are example fictitious numbers.\nThere are similar ranges of numbers in the UK, Ireland, and Australia.\nIn the UK, 044-113-496-1234 is a fictitious number in the Leeds area.\nIn Ireland, the number 353-020-913-1234 is fictitious.\nAnd in Australia, 061-900-654-321 is a fictitious toll-free number.\n911 is a joke.\"\"\"\n```\n\n### What to turn in\n\n-   A code snippet which performs the requested substitution\n-   An assertion test confirming the correctness of your code\n\n### Hints\n\n-   There is more than one way to solve this.\n-   You will **not** receive full credit if you instead use functions like\n    `str.replace`.\n-   You do not need to write a class or a function here, though a simple\n    function might be a helpful way to \"encapsulate\" your code.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzmuhls%2Fling78100-mp4","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzmuhls%2Fling78100-mp4","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzmuhls%2Fling78100-mp4/lists"}