{"id":27005674,"url":"https://github.com/relevanceai/python-doc-utils","last_synced_at":"2025-04-04T07:17:06.280Z","repository":{"id":40397982,"uuid":"387691122","full_name":"RelevanceAI/python-doc-utils","owner":"RelevanceAI","description":"Utilies for documents including accessing, writing and bulk editing in Python","archived":false,"fork":false,"pushed_at":"2022-05-11T01:24:19.000Z","size":72,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2023-03-09T20:16:37.994Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RelevanceAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-20T06:11:06.000Z","updated_at":"2021-12-17T05:44:46.000Z","dependencies_parsed_at":"2022-08-09T19:20:16.097Z","dependency_job_id":null,"html_url":"https://github.com/RelevanceAI/python-doc-utils","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RelevanceAI%2Fpython-doc-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RelevanceAI%2Fpython-doc-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RelevanceAI%2Fpython-doc-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RelevanceAI%2Fpython-doc-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RelevanceAI","download_url":"https://codeload.github.com/RelevanceAI/python-doc-utils/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247135126,"owners_count":20889421,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-04T07:17:05.229Z","updated_at":"2025-04-04T07:17:06.272Z","avatar_url":"https://github.com/RelevanceAI.png","language":"Python","readme":"# python-doc-utils\nUtilies for documents including accessing, writing and bulk editing in Python\n\n### Installation \n\nTo install this utility, run the following: \n\n```\npip install document-utils\n```\n\n### To use\n\n```python\nfrom doc_utils import DocUtils\n\nclass Encoder(DocUtils):\n    \"\"\"Any class instantiation that may want\n    document navigation\n    \"\"\"\n```\n\n## Package Usage \n\nWhen we want to access field values, we often do this:\n\n`d[\"field1\"][\"field2\"]`\nHowever, this can cause a number of problems. \nFor example - if field2 is missing from field 1 - it would error. \n\nThis package allows you to access nested fields using dot notation. For e.g. \n\n`get_field(\"field1.field2\", d)` is the equivalent of the above.\n\nAlternatively: \n\n`d[\"field1.field2\"]`\n\n`get_field(d, \"field2.field2\")`\n\nThe reason why we want to use this is because when we write functions \nthat are field-independent, we want to be able to loop through each field. \n\nFor example: \n\n```{python}\n\ndef add_field_suffix(documents, field):\n    \"\"\"Add 'xyz' to a field\n    \"\"\"\n    return documents[field] + '-xyz'\n```\n\nThis would be impossible if the field was nested!\n\nHowever, if you ran this: \n\n```{python}\n\ndef add_field_suffix(documents, field):\n    \"\"\"Add 'xyz' to a field \n    \"\"\"\n    return self.get_field(d, field) + \"-xyz\"\n```\n\nBased on the above function, you can now run it across `field1.field2` as well!\n\nFor convenience subsetting documents, use the `subset_documents` method. \nThis method acts as a quick way to iterate of multiple fields and multiple \ndocuments.\n\nFor example:\n```{python}\ndocs = [\n    {\"doc0\": { \"field0\": \"value1\", \"field1\": \"value2\"}},\n    {\"doc1\": { \"field0\": \"value3\", \"field1\": \"value4\"}},\n]\nfields = [\"doc0.field0\"]\n\nsubset_documents = DocUtils.subset_documents(fields, docs)\n# subset_documents would be \n# [\n#      {\"doc0.field0\": \"value1\"},\n#      {\"doc0.field0\": \"\"},\n# ]\n```\n\n### TODO\n\n- Enable more versatile functionality for document navigation\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frelevanceai%2Fpython-doc-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frelevanceai%2Fpython-doc-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frelevanceai%2Fpython-doc-utils/lists"}