https://github.com/relevanceai/python-doc-utils

Utilies for documents including accessing, writing and bulk editing in Python
https://github.com/relevanceai/python-doc-utils

Last synced: about 1 year ago
JSON representation

Utilies for documents including accessing, writing and bulk editing in Python

Host: GitHub
URL: https://github.com/relevanceai/python-doc-utils
Owner: RelevanceAI
License: apache-2.0
Created: 2021-07-20T06:11:06.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2022-05-11T01:24:19.000Z (almost 4 years ago)
Last Synced: 2023-03-09T20:16:37.994Z (about 3 years ago)
Language: Python
Size: 70.3 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # python-doc-utils

Utilies for documents including accessing, writing and bulk editing in Python

### Installation 

To install this utility, run the following: 

```

pip install document-utils

```

### To use

```python

from doc_utils import DocUtils

class Encoder(DocUtils):

    """Any class instantiation that may want

    document navigation

    """

```

## Package Usage 

When we want to access field values, we often do this:

`d["field1"]["field2"]`

However, this can cause a number of problems. 

For example - if field2 is missing from field 1 - it would error. 

This package allows you to access nested fields using dot notation. For e.g. 

`get_field("field1.field2", d)` is the equivalent of the above.

Alternatively: 

`d["field1.field2"]`

`get_field(d, "field2.field2")`

The reason why we want to use this is because when we write functions 

that are field-independent, we want to be able to loop through each field. 

For example: 

```{python}

def add_field_suffix(documents, field):

    """Add 'xyz' to a field

    """

    return documents[field] + '-xyz'

```

This would be impossible if the field was nested!

However, if you ran this: 

```{python}

def add_field_suffix(documents, field):

    """Add 'xyz' to a field 

    """

    return self.get_field(d, field) + "-xyz"

```

Based on the above function, you can now run it across `field1.field2` as well!

For convenience subsetting documents, use the `subset_documents` method. 

This method acts as a quick way to iterate of multiple fields and multiple 

documents.

For example:

```{python}

docs = [

    {"doc0": { "field0": "value1", "field1": "value2"}},

    {"doc1": { "field0": "value3", "field1": "value4"}},

]

fields = ["doc0.field0"]

subset_documents = DocUtils.subset_documents(fields, docs)

# subset_documents would be 

# [

#      {"doc0.field0": "value1"},

#      {"doc0.field0": ""},

# ]

```

### TODO

- Enable more versatile functionality for document navigation

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/relevanceai/python-doc-utils

Awesome Lists containing this project

README