{"id":13680191,"url":"https://github.com/comtravo/ctparse","last_synced_at":"2025-04-29T23:30:48.080Z","repository":{"id":30897082,"uuid":"126318088","full_name":"comtravo/ctparse","owner":"comtravo","description":"Parse natural language time expressions in python","archived":true,"fork":false,"pushed_at":"2022-11-28T09:08:57.000Z","size":3088,"stargazers_count":130,"open_issues_count":6,"forks_count":25,"subscribers_count":28,"default_branch":"master","last_synced_at":"2025-04-16T04:27:56.443Z","etag":null,"topics":["machine-learning","nlp","python","python-library","regular-expression","time-parsing"],"latest_commit_sha":null,"homepage":"https://www.comtravo.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/comtravo.png","metadata":{"files":{"readme":"README.rst","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-22T10:24:50.000Z","updated_at":"2025-04-09T05:27:18.000Z","dependencies_parsed_at":"2023-01-14T17:54:40.495Z","dependency_job_id":null,"html_url":"https://github.com/comtravo/ctparse","commit_stats":null,"previous_names":[],"tags_count":58,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/comtravo%2Fctparse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/comtravo%2Fctparse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/comtravo%2Fctparse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/comtravo%2Fctparse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/comtravo","download_url":"https://codeload.github.com/comtravo/ctparse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251599768,"owners_count":21615577,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","nlp","python","python-library","regular-expression","time-parsing"],"created_at":"2024-08-02T13:01:14.098Z","updated_at":"2025-04-29T23:30:46.829Z","avatar_url":"https://github.com/comtravo.png","language":"Python","readme":"===========================================================\nctparse - Parse natural language time expressions in python\n===========================================================\n\n.. image:: https://img.shields.io/pypi/v/ctparse.svg\n     :target: https://pypi.python.org/pypi/ctparse\n     :alt: PyPi\n\n.. image:: https://readthedocs.org/projects/ctparse/badge/?version=latest\n     :target: https://ctparse.readthedocs.io/en/latest/?badge=latest\n     :alt: Documentation Status\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n    :target: https://github.com/psf/black\n\n\n* Free software: MIT license\n* Documentation: https://ctparse.readthedocs.io.\n\n\nBackground\n----------\n\nThe package ``ctparse`` is a pure python package to parse time\nexpressions from natural language (i.e. strings). In many ways it builds\non similar concepts as Facebook’s ``duckling`` package\n(https://github.com/facebook/duckling). However, for the time being it\nonly targets times and only German and English text.\n\nIn principle ``ctparse`` can be used to **detect** time expressions in a\ntext, however its main use case is the semantic interpretation of such\nexpressions. Detecting time expressions in the first place can - to our\nexperience - be done more efficiently (and precisely) using e.g. CRFs or\nother models targeted at this specific task.\n\n``ctparse`` is designed with the use case in mind where interpretation\nof time expressions is done under the following assumptions:\n\n-  All expressions are relative to some pre-defined reference times\n-  Unless explicitly specified in the time expression, valid resolutions\n   are in the future relative to the reference time (i.e. ``12.5.`` will\n   be the next 12th of May, but ``12.5.2012`` should correctly resolve\n   to the 12th of May 2012).\n-  If in doubt, resolutions in the near future are more likely than\n   resolutions in the far future (not implemented yet, but any\n   resolution more than i.e. 3 month in the future is extremely\n   unlikely).\n\nThe specific comtravo use-case is resolving time expressions in booking\nrequests which almost always refer to some point in time within the next\n4-8 weeks.\n\n``ctparse`` currently is language agnostic and supports German and\nEnglish expressions. This might get an extension in the future. The main\nreason is that in real world communication more often than not people\nwrite in one language (their business language) but use constructs to\nexpress times that are based on their mother tongue and/or what they\nbelieve to be the way to express dates in the target language. This\nleads to text in German with English time expressions and vice-versa.\nUsing a language detection upfront on the complete original text is for\nobvious no solution - rather it would make the problem worse.\n\nExample\n-------\n\n.. code:: python\n\n   from ctparse import ctparse\n   from datetime import datetime\n\n   # Set reference time\n   ts = datetime(2018, 3, 12, 14, 30)\n   ctparse('May 5th 2:30 in the afternoon', ts=ts)\n\nThis should return a ``Time`` object represented as\n``Time[0-29]{2018-05-05 14:30 (X/X)}``, indicating that characters\n``0-29`` were used in the resolution, that the resolved date time is the\n5th of May 2018 at 14:30 and that this resolution is neither based on a\nday of week (first ``X``) nor a part of day (second ``X``).\n\n\nLatent time\n~~~~~~~~~~~\n\nNormally, ``ctparse`` will anchor time expressions to the reference time.\nFor example, when parsing the time expression ``8:00 pm``, ctparse will\nresolve the expression to 8 pm after the reference time as follows\n\n.. code:: python\n\n   parse = ctparse(\"8:00 pm\", ts=datetime(2020, 1, 1, 7, 0), latent_time=True) # default\n   # parse.resolution -\u003e Time(2020, 1, 1, 20, 00)\n\nThis behavior can be customized using the option ``latent_time=False``, which will\nreturn a time resolution not anchored to a particular date\n\n.. code:: python\n\n   parse = ctparse(\"8:00 pm\", ts=datetime(2020, 1, 1, 7, 0), latent_time=False)\n   # parse.resolution -\u003e Time(None, None, None, 20, 00)\n\nImplementation\n--------------\n\n``ctparse`` - as ``duckling`` - is a mixture of a rule and regular\nexpression based system + some probabilistic modeling. In this sense it\nresembles a PCFG.\n\nRules\n~~~~~\n\nAt the core ``ctparse`` is a collection of production rules over\nsequences of regular expressions and (intermediate) productions.\n\nProductions are either of type ``Time``, ``Interval`` or ``Duration`` and can\nhave certain predicates (e.g. whether a ``Time`` is a part of day like\n``'afternoon'``).\n\nA typical rule than looks like this:\n\n.. code:: python\n\n   @rule(predicate('isDate'), dimension(Interval))\n\nI.e. this rule is applicable when the intermediate production resulted\nin something that has a date, followed by something that is in interval\n(like e.g. in ``'May 5th 9-10'``).\n\nThe actual production is a python function with the following signature:\n\n.. code:: python\n\n   @rule(predicate('isDate'), dimension(Interval))\n   def ruleDateInterval(ts, d, i):\n     \"\"\"\n     param ts: datetime - the current refenrence time\n     d: Time - a time that contains at least a full date\n     i: Interval - some Interval\n     \"\"\"\n     if not (i.t_from.isTOD and i.t_to.isTOD):\n       return None\n     return Interval(\n       t_from=Time(year=d.year, month=d.month, day=d.day,\n                   hour=i.t_from.hour, minute=i.t_from.minute),\n       t_to=Time(year=d.year, month=d.month, day=d.day,\n                 hour=i.t_to.hour, minute=i.t_to.minute))\n\nThis production will return a new interval at the date of\n``predicate('isDate')`` spanning the time coded in\n``dimension(Interval)``. If the latter does code for something else than\na time of day (TOD), no production is returned, e.g. the rule matched\nbut failed.\n\n\nTechnical Background\n~~~~~~~~~~~~~~~~~~~~\n\nSome observations on the problem:\n\n-  Each rule is a combination of regular expressions and productions.\n-  Consequently, each production must originate in a sequence of regular\n   expressions that must have matched (parts of) the text.\n-  Hence, only subsequence of **all** regular expressions in **all**\n   rules can lead to a successful production.\n\nTo this end the algorithm proceeds as follows:\n\n1. Input a string and a reference time\n2. Find all matches of all regular expressions from all rules in the\n   input strings. Each regular expression is assigned an identifier.\n3. Find all distinct sequences of these matches where two matches do not\n   overlap nor have a gap inbetween\n4. To each such subsequence apply all rules at all possible positions\n   until no further rules can be applied - in which case one solution is\n   produced\n\nObviously, not all sequences of matching expressions and not all\nsequences of rules applied on top lead to meaningful results. Here the\n**P**\\ CFG kicks in:\n\n-  Based on example data (``corpus.py``) a model is calibrated to\n   predict how likely a production is to lead to a/the correct result.\n   Instead of doing a breadth first search, the most promising\n   productions are applied first.\n-  Resolutions are produced until there are no more resolutions or a\n   timeout is hit.\n-  Based on the same model from all resolutions the highest scoring is\n   returned.\n\n\nCredits\n-------\n\nThis package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.\n\n.. _Cookiecutter: https://github.com/audreyr/cookiecutter\n.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomtravo%2Fctparse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcomtravo%2Fctparse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomtravo%2Fctparse/lists"}