{"id":25324087,"url":"https://github.com/nielstron/quantulum3","last_synced_at":"2025-04-13T00:47:59.195Z","repository":{"id":44988398,"uuid":"145027650","full_name":"nielstron/quantulum3","owner":"nielstron","description":"Library for unit extraction - fork of quantulum for python3","archived":false,"fork":false,"pushed_at":"2024-06-25T14:18:47.000Z","size":143128,"stargazers_count":137,"open_issues_count":50,"forks_count":67,"subscribers_count":5,"default_branch":"dev","last_synced_at":"2025-04-04T03:12:50.289Z","etag":null,"topics":["artificial-intelligence","hacktoberfest","machine-learning","natural-language-processing","nlp","python","quantities","units-of-measure"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nielstron.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-16T18:56:46.000Z","updated_at":"2025-03-19T05:47:26.000Z","dependencies_parsed_at":"2024-06-18T15:19:18.823Z","dependency_job_id":"c1e00ec6-f914-4240-b8a9-27b83f1f6b29","html_url":"https://github.com/nielstron/quantulum3","commit_stats":{"total_commits":495,"total_committers":21,"mean_commits":"23.571428571428573","dds":0.4181818181818182,"last_synced_commit":"9dafd76d3586aa5ea1b96164d86c73037e827294"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nielstron%2Fquantulum3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nielstron%2Fquantulum3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nielstron%2Fquantulum3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nielstron%2Fquantulum3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nielstron","download_url":"https://codeload.github.com/nielstron/quantulum3/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650435,"owners_count":21139672,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","hacktoberfest","machine-learning","natural-language-processing","nlp","python","quantities","units-of-measure"],"created_at":"2025-02-14T00:56:34.802Z","updated_at":"2025-04-13T00:47:59.174Z","avatar_url":"https://github.com/nielstron.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# quantulum3\n\n [![Travis master build state](https://app.travis-ci.com/nielstron/quantulum3.svg?branch=master \"Travis master build state\")](https://app.travis-ci.com/nielstron/quantulum3)\n [![Coverage Status](https://coveralls.io/repos/github/nielstron/quantulum3/badge.svg?branch=master)](https://coveralls.io/github/nielstron/quantulum3?branch=master)\n [![PyPI version](https://badge.fury.io/py/quantulum3.svg)](https://pypi.org/project/quantulum3/)\n ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/quantulum3.svg)\n [![PyPI - Status](https://img.shields.io/pypi/status/quantulum3.svg)](https://pypi.org/project/quantulum3/)\n\nPython library for information extraction of quantities, measurements\nand their units from unstructured text. It is able to disambiguate between similar\nlooking units based on their *k-nearest neighbours* in their [GloVe](https://nlp.stanford.edu/projects/glove/) vector representation\nand their [Wikipedia](https://en.wikipedia.org/) page.\n\nThis is the Python 3 compatible fork of [recastrodiaz\\'\nfork](https://github.com/recastrodiaz/quantulum) of [grhawks\\'\nfork](https://github.com/grhawk/quantulum) of [the original by Marco\nLagi](https://github.com/marcolagi/quantulum).\nThe compatibility with the newest version of sklearn is based on\nthe fork of [sohrabtowfighi](https://github.com/sohrabtowfighi/quantulum).\n\n## User Guide\n\n### Installation\n\n```bash\npip install quantulum3\n```\n\nTo install dependencies for using or training the disambiguation classifier, use\n\n```bash\npip install quantulum3[classifier]\n```\n\nThe disambiguation classifier is used when the parser find two or more units that are a match for the text.\n\n### Usage\n\n```pycon\n\u003e\u003e\u003e from quantulum3 import parser\n\u003e\u003e\u003e quants = parser.parse('I want 2 liters of wine')\n\u003e\u003e\u003e quants\n[Quantity(2, 'litre')]\n```\n\nThe *Quantity* class stores the surface of the original text it was\nextracted from, as well as the (start, end) positions of the match:\n\n```pycon\n\u003e\u003e\u003e quants[0].surface\nu'2 liters'\n\u003e\u003e\u003e quants[0].span\n(7, 15)\n```\n\nThe *value* attribute provides the parsed numeric value and the *unit.name*\nattribute provides the name of the parsed unit:\n\n```pycon\n\u003e\u003e\u003e quants[0].value\n2.0\n\u003e\u003e\u003e quants[0].unit.name\n'litre'\n```\n\nAn inline parser that embeds the parsed quantities in the text is also\navailable (especially useful for debugging):\n\n```pycon\n\u003e\u003e\u003e print parser.inline_parse('I want 2 liters of wine')\nI want 2 liters {Quantity(2, \"litre\")} of wine\n```\n\nAs the parser is also able to parse dimensionless numbers,\nthis library can also be used for simple number extraction.\n\n```pycon\n\u003e\u003e\u003e print parser.parse('I want two')\n[Quantity(2, 'dimensionless')]\n```\n\n### Units and entities\n\nAll units (e.g. *litre*) and the entities they are associated to (e.g.\n*volume*) are reconciled against WikiPedia:\n\n```pycon\n\u003e\u003e\u003e quants[0].unit\nUnit(name=\"litre\", entity=Entity(\"volume\"), uri=https://en.wikipedia.org/wiki/Litre)\n\n\u003e\u003e\u003e quants[0].unit.entity\nEntity(name=\"volume\", uri=https://en.wikipedia.org/wiki/Volume)\n```\n\nThis library includes more than 290 units and 75 entities. It also\nparses spelled-out numbers, ranges and uncertainties:\n\n```pycon\n\u003e\u003e\u003e parser.parse('I want a gallon of beer')\n[Quantity(1, 'gallon')]\n\n\u003e\u003e\u003e parser.parse('The LHC smashes proton beams at 12.8–13.0 TeV')\n[Quantity(12.8, \"teraelectronvolt\"), Quantity(13, \"teraelectronvolt\")]\n\n\u003e\u003e\u003e quant = parser.parse('The LHC smashes proton beams at 12.9±0.1 TeV')\n\u003e\u003e\u003e quant[0].uncertainty\n0.1\n```\n\nNon-standard units usually don\\'t have a WikiPedia page. The parser will\nstill try to guess their underlying entity based on their\ndimensionality:\n\n```pycon\n\u003e\u003e\u003e parser.parse('Sound travels at 0.34 km/s')[0].unit\nUnit(name=\"kilometre per second\", entity=Entity(\"speed\"), uri=None)\n```\n\n### Export/Import\n\nEntities, Units and Quantities can be exported to dictionaries and JSON strings:\n\n```pycon\n\u003e\u003e\u003e quant = parser.parse('I want 2 liters of wine')\n\u003e\u003e\u003e quant[0].to_dict()\n{'value': 2.0, 'unit': 'litre', \"entity\": \"volume\", 'surface': '2 liters', 'span': (7, 15), 'uncertainty': None, 'lang': 'en_US'}\n\u003e\u003e\u003e quant[0].to_json()\n'{\"value\": 2.0, \"unit\": \"litre\", \"entity\": \"volume\", \"surface\": \"2 liters\", \"span\": [7, 15], \"uncertainty\": null, \"lang\": \"en_US\"}'\n```\n\nBy default, only the unit/entity name is included in the exported dictionary, but these can be included:\n\n```pycon\n\u003e\u003e\u003e quant = parser.parse('I want 2 liters of wine')\n\u003e\u003e\u003e quant[0].to_dict(include_unit_dict=True, include_entity_dict=True)  # same args apply to .to_json()\n{'value': 2.0, 'unit': {'name': 'litre', 'surfaces': ['cubic decimetre', 'cubic decimeter', 'litre', 'liter'], 'entity': {'name': 'volume', 'dimensions': [{'base': 'length', 'power': 3}], 'uri': 'Volume'}, 'uri': 'Litre', 'symbols': ['l', 'L', 'ltr', 'ℓ'], 'dimensions': [{'base': 'decimetre', 'power': 3}], 'original_dimensions': [{'base': 'litre', 'power': 1, 'surface': 'liters'}], 'currency_code': None, 'lang': 'en_US'}, 'entity': 'volume', 'surface': '2 liters', 'span': (7, 15), 'uncertainty': None, 'lang': 'en_US'}\n```\n\nSimilar export syntax applies to exporting Unit and Entity objects.\n\nYou can import Entity, Unit and Quantity objects from dictionaries and JSON. This requires that the object was exported with `include_unit_dict=True` and `include_entity_dict=True` (as appropriate):\n\n```pycon\n\u003e\u003e\u003e quant_dict = quant[0].to_dict(include_unit_dict=True, include_entity_dict=True)\n\u003e\u003e\u003e quant = Quantity.from_dict(quant_dict)\n\u003e\u003e\u003e ent_json = \"{'name': 'volume', 'dimensions': [{'base': 'length', 'power': 3}], 'uri': 'Volume'}\"\n\u003e\u003e\u003e ent = Entity.from_json(ent_json)\n```\n\n### Disambiguation\n\nIf the parser detects an ambiguity, a classifier based on the WikiPedia\npages of the ambiguous units or entities tries to guess the right one:\n\n```pycon\n\u003e\u003e\u003e parser.parse('I spent 20 pounds on this!')\n[Quantity(20, \"pound sterling\")]\n\n\u003e\u003e\u003e parser.parse('It weighs no more than 20 pounds')\n[Quantity(20, \"pound-mass\")]\n```\n\nor:\n\n```pycon\n\u003e\u003e\u003e text = 'The average density of the Earth is about 5.5x10-3 kg/cm³'\n\u003e\u003e\u003e parser.parse(text)[0].unit.entity\nEntity(name=\"density\", uri=https://en.wikipedia.org/wiki/Density)\n\n\u003e\u003e\u003e text = 'The amount of O₂ is 2.98e-4 kg per liter of atmosphere'\n\u003e\u003e\u003e parser.parse(text)[0].unit.entity\nEntity(name=\"concentration\", uri=https://en.wikipedia.org/wiki/Concentration)\n```\n\nIn addition to that, the classifier is trained on the most similar words to\nall of the units surfaces, according to their distance in [GloVe](https://nlp.stanford.edu/projects/glove/)\nvector representation.\n\n### Spoken version\n\nQuantulum classes include methods to convert them to a speakable unit.\n\n```pycon\n\u003e\u003e\u003e parser.parse(\"Gimme 10e9 GW now!\")[0].to_spoken()\nten billion gigawatts\n\u003e\u003e\u003e parser.inline_parse_and_expand(\"Gimme $1e10 now and also 1 TW and 0.5 J!\")\nGimme ten billion dollars now and also one terawatt and zero point five joules!\n```\n\n\n\n### Manipulation\n\nWhile quantities cannot be manipulated within this library, there are\nmany great options out there:\n\n- [pint](https://pint.readthedocs.org/en/latest/)\n- [natu](http://kdavies4.github.io/natu/)\n- [quantities](http://python-quantities.readthedocs.org/en/latest/)\n\n## Extension\n\n### Training the classifier\n\nIf you want to train the classifier yourself, you will need the dependencies for the classifier (see installation).\n\nUse `quantulum3-training` on the command line, the script `quantulum3/scripts/train.py` or the method `train_classifier` in `quantulum3.classifier` to train the classifier.\n\n``` bash\nquantulum3-training --lang \u003clanguage\u003e --data \u003cpath/to/training/file.json\u003e --output \u003cpath/to/output/file.joblib\u003e\n```\n\nYou can pass multiple training files in to the training command. The output is in joblib format.\n\nTo use your custom model, pass the path to the trained model file to the\nparser:\n\n```pyton\nparser = Parser.parse(\u003ctext\u003e, classifier_path=\"path/to/model.joblib\")\n```\n\nExample training files can be found in `quantulum3/_lang/\u003clanguage\u003e/train`.\n\nIf you want to create a new or different `similars.json`, install `pymagnitude`.\n\nFor the extraction of nearest neighbours from a vector word representation file, \nuse `scripts/extract_vere.py`. It automatically extracts the `k` nearest neighbours\nin vector space of the vector representation for each of the possible surfaces\nof the ambiguous units. The resulting neighbours are stored in `quantulum3/similars.json`\nand automatically included for training.\n\nThe file provided should be in `.magnitude` format as other formats are first\nconverted to a `.magnitude` file on-the-run. Check out\n[pre-formatted Magnitude formatted word-embeddings](https://github.com/plasticityai/magnitude#pre-converted-magnitude-formats-of-popular-embeddings-models)\nand [Magnitude](https://github.com/plasticityai/magnitude) for more information.\n\n### Additional units\n\nIt is possible to add additional entities and units to be parsed by quantulum. These will be added to the default units and entities. See below code for an example invocation:\n\n```pycon\n\u003e\u003e\u003e from quantulum3.load import add_custom_unit, remove_custom_unit\n\u003e\u003e\u003e add_custom_unit(name=\"schlurp\", surfaces=[\"slp\"], entity=\"dimensionless\")\n\u003e\u003e\u003e parser.parse(\"This extremely sharp tool is precise up to 0.5 slp\")\n[Quantity(0.5, \"Unit(name=\"schlurp\", entity=Entity(\"dimensionless\"), uri=None)\")]\n```\n\nThe keyword arguments to the function `add_custom_unit` are directly translated\nto the properties of the unit to be created.\n\n### Custom Units and Entities\n\nIt is possible to load a completely custom set of units and entities. This can be done by passing a list of file paths to the load_custom_units and load_custom_entities functions. Loading custom untis and entities will replace the default units and entities that are normally loaded.\n\nThe recomended way to load quantities is via a context manager:\n\n```pycon\n\u003e\u003e\u003e from quantulum3 import load, parser\n\u003e\u003e\u003e with load.CustomQuantities([\"path/to/units.json\"], [\"path/to/entities.json\"]):\n\u003e\u003e\u003e     parser.parse(\"This extremely sharp tool is precise up to 0.5 slp\")\n\n[Quantity(0.5, \"Unit(name=\"schlurp\", entity=Entity(\"dimensionless\"), uri=None)\")]\n\n\u003e\u003e\u003e # default units and entities are loaded again\n```\n\nBut it is also possible to load custom units and entities manually:\n\n```pycon\n\u003e\u003e\u003e from quantulum3 import load, parser\n\n\u003e\u003e\u003e load.load_custom_units([\"path/to/units.json\"])\n\u003e\u003e\u003e load.load_custom_entities([\"path/to/entities.json\"])\n\u003e\u003e\u003e parser.parse(\"This extremely sharp tool is precise up to 0.5 slp\")\n\n[Quantity(0.5, \"Unit(name=\"schlurp\", entity=Entity(\"dimensionless\"), uri=None)\")]\n\n\u003e\u003e\u003e # remove custom units and entities and load default units and entities\n\u003e\u003e\u003e load.reset_quantities()\n```\n\nSee the Developer Guide below for more information about the format of units and entities files.\n\n## Developer Guide\n\n### Adding Units and Entities\n\nSee *units.json* for the complete list of units and *entities.json* for\nthe complete list of entities. The criteria for adding units have been:\n\n- the unit has (or is redirected to) a WikiPedia page\n- the unit is in common use (e.g. not the [premetric Swedish units of\n    measurement](https://en.wikipedia.org/wiki/Swedish_units_of_measurement#Length)).\n\nIt\\'s easy to extend these two files to the units/entities of interest.\nHere is an example of an entry in *entities.json*:\n\n```json\n\"speed\": {\n    \"dimensions\": [{\"base\": \"length\", \"power\": 1}, {\"base\": \"time\", \"power\": -1}],\n    \"URI\": \"https://en.wikipedia.org/wiki/Speed\"\n}\n```\n\n- The *name* of an entity is its key. Names are required to be unique.\n- *URI* is the name of the wikipedia page of the entity. (i.e. `https://en.wikipedia.org/wiki/Speed` =\u003e `Speed`)\n- *dimensions* is the dimensionality, a list of dictionaries each\n    having a *base* (the name of another entity) and a *power* (an\n    integer, can be negative).\n\nHere is an example of an entry in *units.json*:\n\n```json\n\"metre per second\": {\n    \"surfaces\": [\"metre per second\", \"meter per second\"],\n    \"entity\": \"speed\",\n    \"URI\": \"Metre_per_second\",\n    \"dimensions\": [{\"base\": \"metre\", \"power\": 1}, {\"base\": \"second\", \"power\": -1}],\n    \"symbols\": [\"mps\"]\n},\n\"year\": {\n    \"surfaces\": [ \"year\", \"annum\" ],\n    \"entity\": \"time\",\n    \"URI\": \"Year\",\n    \"dimensions\": [],\n    \"symbols\": [ \"a\", \"y\", \"yr\" ],\n    \"prefixes\": [ \"k\", \"M\", \"G\", \"T\", \"P\", \"E\" ]\n}\n```\n\n- The *name* of a unit is its key. Names are required to be unique.\n- *URI* follows the same scheme as in the *entities.json*\n- *surfaces* is a list of strings that refer to that unit. The library\n    takes care of plurals, no need to specify them.\n- *entity* is the name of an entity in *entities.json*\n- *dimensions* follows the same schema as in *entities.json*, but the\n    *base* is the name of another unit, not of another entity.\n- *symbols* is a list of possible symbols and abbreviations for that\n    unit.\n- *prefixes* is an optional list. It can contain [Metric](https://en.wikipedia.org/wiki/Metric_prefix) and [Binary prefixes](https://en.wikipedia.org/wiki/Binary_prefix) and\n    automatically generates according units. If you want to\n    add specifics (like different surfaces) you need to create an entry for that\n    prefixes version on its own.\n\nAll fields are case sensitive.\n\n### Contributing\n\n`dev` build: \n\n[![Travis dev build state](https://travis-ci.com/nielstron/quantulum3.svg?branch=dev \"Travis dev build state\")](https://travis-ci.com/nielstron/quantulum3)\n[![Coverage Status](https://coveralls.io/repos/github/nielstron/quantulum3/badge.svg?branch=dev)](https://coveralls.io/github/nielstron/quantulum3?branch=dev)\n\nIf you'd like to contribute follow these steps:\n1. Clone a fork of this project into your workspace\n2. Run `pip install -e .` at the root of your development folder.\n3. `pip install pipenv` and `pipenv shell`\n4. Inside the project folder run `pipenv install --dev`\n5. Make your changes\n6. Run `scripts/format.sh` and `scripts/build.py` from the package root directory.\n7. Test your changes with `python3 setup.py test` \n(Optional, will be done automatically after pushing)\n8. Create a Pull Request when having commited and pushed your changes\n\n### Language support\n\n[![Travis dev build state](https://travis-ci.com/nielstron/quantulum3.svg?branch=language_support \"Travis dev build state\")](https://travis-ci.com/nielstron/quantulum3)\n[![Coverage Status](https://coveralls.io/repos/github/nielstron/quantulum3/badge.svg?branch=language_support)](https://coveralls.io/github/nielstron/quantulum3?branch=dev)\n\nThere is a branch for language support, namely `language_support`.\nFrom inspecting the `README` file in the `_lang` subdirectory and\nthe functions and values given in the new `_lang.en_US` submodule,\none should be able to create own language submodules.\nThe new language modules should automatically be invoked and be available,\nboth through the `lang=` keyword argument in the parser functions as well\nas in the automatic unittests.\n\nNo changes outside the own language submodule folder (i.e. `_lang.de_DE`) should\nbe necessary. If there are problems implementing a new language, don't hesitate to open an issue.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnielstron%2Fquantulum3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnielstron%2Fquantulum3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnielstron%2Fquantulum3/lists"}