{"id":16672868,"url":"https://github.com/bgutter/sylvia","last_synced_at":"2025-03-21T17:33:06.465Z","repository":{"id":57472829,"uuid":"86208048","full_name":"bgutter/sylvia","owner":"bgutter","description":"Use phoneme-based regular expressions to find words in the Carnegie-Mellon Pronouncing Dictionary.","archived":false,"fork":false,"pushed_at":"2023-12-14T02:38:12.000Z","size":7773,"stargazers_count":33,"open_issues_count":1,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-10-13T12:07:34.823Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bgutter.png","metadata":{"files":{"readme":"README.org","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-26T04:43:56.000Z","updated_at":"2024-09-08T18:43:56.000Z","dependencies_parsed_at":"2022-09-05T07:10:44.980Z","dependency_job_id":null,"html_url":"https://github.com/bgutter/sylvia","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgutter%2Fsylvia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgutter%2Fsylvia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgutter%2Fsylvia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgutter%2Fsylvia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bgutter","download_url":"https://codeload.github.com/bgutter/sylvia/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221817422,"owners_count":16885532,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-12T12:07:33.076Z","updated_at":"2024-10-28T10:33:17.376Z","avatar_url":"https://github.com/bgutter.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"#+TITLE: Sylvia\n\nSearch pronunciations in the CMU Pronouncing Dictionary using a reglular-expression like syntax. Input-format regular expressions are lightly preprocessed into Python-format regular expressions, and then mapped over an encoded version of cmudict. Results are sorted by popularity using Peter Norvig's list of word popularities derived from Google's N-Gram dataset.\n\n* Here for the Emacs library?\nYou can skip this and jump directly into the [[./sylvia-emacs/README.org][sylvia-mode README]]!\n\n* Installation\n\n#+BEGIN_SRC sh\nbrandon@brandon-babypad-linux ~\u003e pip2 install sylvia\n#+END_SRC\n\n* Usage\n\nInteractive Sylvia prompt:\n\n#+BEGIN_SRC\nbrandon@brandon-babypad-linux ~\u003e python2 -m sylvia\n\n    Type 'help' for options, press enter to quit.\n\nsylvia\u003e \n#+END_SRC\n\nRun one-off command:\n\n#+BEGIN_SRC\nbrandon@brandon-babypad-linux ~\u003e python2 -m sylvia -c \"regex G #* AE #* IH #* %\"\nGravity             Graphical           Grandchildren       Garrison            Graphically         \nGallegos            Gravitate           Garretson           Gastineau           Gallimard           \nGalligan            Grandison           Gallivan            Glatfelter          Garibay             \nGarelick            Garrigan            Garriga             Gravitates          Galipeau            \nGavigan             Gamelin             Gateley             Grandillo           Galipault           \nGarringer           Gradison            Grandchildren's     Glastetter          Garity              \nGalliher            Gantenbein\n#+END_SRC\n\n* Commands\n\nSylvia's functionality is broken down into various subcommands. These commands can be run from the interactive prompt, or as single-lines directly from your system shell.\n\n** regex\n\nThis is the most powerful feature of Sylvia. It allows searches of cmudict based on phoneme patterns.\n\nSylvia's query format is nearly identical to traditional Python 2 regular expressions, with the exception that it is intended not to match against patterns of characters, but rather patterns of phonemes. To construct a regular expression query for Sylvia, remember the following rules:\n\n1. Whitespace must be used to delimit consecutive phoneme literals. It may also be used anywhere else in the regular expression, as whitespace is meaningless in the context of a phoneme sequence, and will be stripped during preprocessing.\n1. `#` is a shortcut for \"any consonant sound\"\n1. `@` is a shortcut for \"any vowel sound\"\n1. `%` is a shortcut for \"any syllable\", and is equivalant to `#*@#*`\n1. Otherwise, whatever flies with Python's regular expression format will work in Sylvia. Just use some common sense, as some things (such as character classes) will be wholly inapplicable to searches in phoneme-space.\n\nUse of this command is as follows:\n\n#+BEGIN_SRC\nsylvia\u003e regex {regex tokens}\n#+END_SRC\n\n[[http://www.speech.cs.cmu.edu/cgi-bin/cmudict][Consult Carnegie Mellon's cmudict documentation]] to learn more about the phoneme set.\n\n[[https://docs.python.org/2/library/re.html][Consult the Python docs]] to learn more about Python's regex format.\n\n*** Examples\n\nFind words starting with zero or more consonant sounds, followed by the \"long E\" sound (phoneme IY), followed by zero or more consonant sounds, followed by the \"ed\" sound (the phoneme sequence EH D):\n\n#+BEGIN_SRC\nsylvia\u003e regex #* IY #* EH D\nSteelhead     Seabed        Beachhead     Retread       Behead \n#+END_SRC\n\nFind all six syllable words where the first syllable uses the \"short i\" sound (phoneme IH), and ends in either the D or P phonemes.\n\n#+BEGIN_SRC\nsylvia\u003e regex #*IH%%%%%(D|P)\nDifferentiated        Individualized        Deteriorated          Institutionalized     \nIncapacitated         Internationalized     Interrelationship     Misappropriated       \nDisassociated         Discombobulated       Insubstantiated       \n#+END_SRC\n\nNote here that only five % symbols are needed, as a single vowel sound constitutes a single syllable, and we explicitly call out the first vowel sound via IH.\n\nFind all words that start with the R sound, followed by some vowel, followed by the D sound, followed by another vowel, followed by the NG phoneme:\n\n#+BEGIN_SRC\nsylvia\u003e regex R@D@NG\nReading     Riding      Redding     Raiding     Ridding     Reding      Rodding     Ruding      Rawding\n#+END_SRC\n\n** lookup\n\nIf you just want to lookup the pronunciations for a word, you can do that too. This can be a good way to quickly learn the phonemes for a particular sound when constructing queries. Due to cultural and geographic variations in pronunciation, this command can return multiple sequences.\n\nUse of this command is as follows:\n\n#+BEGIN_SRC\nsylvia\u003e lookup {word}\n#+END_SRC\n\n*** Examples\n\n#+BEGIN_SRC\nsylvia\u003e lookup turkmenistan\nT ER K M EH N IH S T AE N     \n#+END_SRC\n\n#+BEGIN_SRC\nsylvia\u003e lookup capture\nK AE P CH ER     \n#+END_SRC\n\n#+BEGIN_SRC\nsylvia\u003e lookup tomato\nT AH M EY T OW     T AH M AA T OW     \n#+END_SRC\n\n** rhyme\n\nSylvia can act as a rhyming dictionary, returning words which rhyme with a given word. There are three \"rhyme levels\", which define how rhymes are determined.\n\n1. *perfect* lists all words which contain the same sequence of phonemes as the given word, including and following the first vowel in the given pronunciation. Before that vowel, the matched words can contain any sounds.\n2. *default* is the same as perfect, except that additional consonant sounds can be interspersed between the matched sequence phonemes.\n3. *loose* is similar, except it ignores consonant sounds entirely.\n\nUse of this command is as follows:\n\n#+BEGIN_SRC\nsylvia\u003e rhyme {rhyme-level} {word}\n#+END_SRC\n\nrhyme-level can be omitted if default behavior is desired.\n\nThere are plans to improve these models by matching phonemes based on their vocal characteristics. For example, all nasal phonemes may be considered matches by default, or all plosive sounds, etc. The behavior documented above is subject to change at any time.\n\n*** Examples\n\nList words which rhyme with \"chatter\", using the perfect algorithm.\n\n#+BEGIN_SRC\nsylvia\u003e rhyme perfect chatter\nMatter             Latter             Batter             Mater              Platter            \nScatter            Flatter            Shatter            Hatter             Splatter           \nFatter             Patter             Antimatter         Clatter            Spatter            \nSchlatter          Blatter            Natter             Sater              Satter             \nSlatter            Tatter             Mcphatter          Chitterchatter     Smatter            \nVanatter           Vannater           Vatter             Vannatter          Mcfatter           \nWildcatter         \n#+END_SRC\n\n...using the default algorithm...\n\n#+BEGIN_SRC\nsylvia\u003e rhyme chatter        \nAfter                  Chapter                Matter                 Master                 \nFactors                Factor                 Pattern                Faster                 \nMatters                Webmaster              Patterns               Adapter                \nContractor             Contractors            Disaster               Actor                  \nMasters                Latter                 Chapters               Actors                 \nAdapters               Lancaster              Saturn                 Adaptor                \nPastor                 Thereafter             Tractor                Scattered              \nDisasters              Ticketmaster           Napster                Laughter               \nReactor                Adaptors               Baxter                 Stratford              \nBlaster                Lantern                Bastard                Maxtor                 \nTractors               Shattered              Plaster                Hereafter              \nSubchapter             Batter                 Broadcasters           Antwerp                \nRaptor                 Mater                  Platter                Scatter                \nHamster                Raster                 Subcontractor          Reactors               \nPastors                Subcontractors         Broadcaster            Mastered\n... many more...\n#+END_SRC\n\n...and using the loose algorithm.\n\n#+BEGIN_SRC\nsylvia\u003e rhyme loose chatter\nAfter                  Standard               Password               Chapter                \nStandards              Rather                 Matter                 Cancer                 \nAnswer                 Master                 Transfer               Answers                \nFactors                Factor                 Pattern                Faster                 \nMatters                Manner                 Webmaster              Patterns               \nHampshire              Adapter                Contractor             Banner                 \nContractors            Alexander              Capture                Disaster               \nActor                  Masters                Traveler               Latter                 \nAlbert                 Chapters               Packard                Answered               \nScanner                Bachelor               Actors                 Transfers              \nAdverse                Amber                  Tracker                Transferred            \nPlanner                Hacker                 Commander              Adapters               \nScanners               Manufactured           Stanford               Manufacture            \nAnchor                 Gathered               Travelers              Captured               \nGrammar                Hazard                 Anger                  Gather                 \nLancaster              Hammer                 Manor                  Programmer             \nHazards                Bradford               Madagascar             Saturn                 \nBanners                Passwords              Adaptor                Pastor                 \nHamburg                Ladder                 Flashers               Programmers            \nPlanners               Thereafter             Chancellor             Frankfurt              \nTractor                Wagner                 Hackers                Scattered              \nBallard                Disasters              Handler                Chandler               \nSanders                Ticketmaster           Napster                Banker                 \nDancer                 Dancers                Jasper                 Laughter               \nBackward               Panthers               Captures               Bladder                \nSampler                Panther                Reactor                Stafford               \nBackwards              Adaptors               Manufactures           Glamour                \nBaxter                 Stratford              Blackburn              Amherst                \nBlaster                Tavern                 Lambert                Fracture \n...many, many more...\n#+END_SRC\n\n** infer\n\nSylvia can infer the pronunciation of unknown words using it's own rule-based text-to-phoneme engine. Don't expect great performance though -- written English is only ostensibly phonetic, and rules-based approaches are not fantastic. Any deep-learning based solution to this problem is likely to beat the snot out of Sylvia's engine.\n\nUse of this command is as follows:\n\n#+BEGIN_SRC\nsylvia\u003e infer {word}\n#+END_SRC\n\n*** Examples\n\nInfer a pronunciation for the word \"rooster\", then compare to the value from lookup.\n\n#+BEGIN_SRC\nsylvia\u003e infer rooster\nR UW S T ER     \n\nsylvia\u003e lookup rooster\nR UW S T ER \n#+END_SRC\n\nInfer pronunciations for some made-up words.\n\n#+BEGIN_SRC\nsylvia\u003e infer rafloy\nR AE F L OY     \n\nsylvia\u003e infer rabbilt\nR AE B IH L T     \n\nsylvia\u003e infer fliberdoodle\nF L IH B ER D UW D AH L   \n#+END_SRC\n\n** lregex\n\nSylvia can lookup words based on normal regular expressions. This command doesn't touch on anything phonetic, but may be useful in the same use-cases as Sylvia itself.\n\nUse of this command is as follows:\n\n#+BEGIN_SRC\nsylvia\u003e lregex {regex tokens}\n#+END_SRC\n\n*** Examples\n\nFind all words /which are spelled/ with a C at the start, a P at the end, and which contain either a T or a D.\n\n#+BEGIN_SRC\nsylvia\u003e lregex c.*(t|d).*p\nCitizenship         Craftsmanship       Countertop          Courtship           Catnip              \nCiticorp            Conservatorship     Catsup              Crudup              Catchup             \nColstrip            Catnap              Cutlip              Coltharp            \n#+END_SRC\n\n** popularity\n\nYou can ask Sylvia for the popularity of a word. This value depends on the data-source used when compiling the dictionary, but by default, it is the value in Peter Norvig's word popularity list. Larger values indicate higher popularity (think occurrences, not rank).\n\nUse of this command is as follows:\n\n#+BEGIN_SRC\nsylvia\u003e popularity {word}\n#+END_SRC\n\n*** Examples\n\nFind the popularity of a popular, typical, and rare word.\n\n#+BEGIN_SRC\nsylvia\u003e popularity I\n3086225277\n\nsylvia\u003e popularity green\n108287905\n\nsylvia\u003e popularity teutonic\n301907\n#+END_SRC\n\n* Contributing and Other General Notes\n\nFor a list of known issues feature ideas, and links to relevant research and documentation, [[./NOTES.org][check out the development notes!]]\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbgutter%2Fsylvia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbgutter%2Fsylvia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbgutter%2Fsylvia/lists"}