{"id":38845284,"url":"https://github.com/growgraph/pelinker","last_synced_at":"2026-01-17T14:04:33.462Z","repository":{"id":252861531,"uuid":"815037501","full_name":"growgraph/pelinker","owner":"growgraph","description":"bio / sci property entity linker","archived":false,"fork":false,"pushed_at":"2025-12-11T00:40:45.000Z","size":110299,"stargazers_count":3,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-12-11T10:21:16.625Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/growgraph.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-06-14T08:08:45.000Z","updated_at":"2025-11-12T23:31:05.000Z","dependencies_parsed_at":"2024-08-13T01:44:16.571Z","dependency_job_id":"0c9f7976-c18b-4ff3-ac14-394437f72a4e","html_url":"https://github.com/growgraph/pelinker","commit_stats":null,"previous_names":["growgraph/pelinker"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/growgraph/pelinker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/growgraph%2Fpelinker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/growgraph%2Fpelinker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/growgraph%2Fpelinker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/growgraph%2Fpelinker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/growgraph","download_url":"https://codeload.github.com/growgraph/pelinker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/growgraph%2Fpelinker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28509861,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T13:38:16.342Z","status":"ssl_error","status_checked_at":"2026-01-17T13:37:44.060Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-17T14:04:33.394Z","updated_at":"2026-01-17T14:04:33.452Z","avatar_url":"https://github.com/growgraph.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PELinker\n\nA service for entity linking of properties\n\n## Developer notes\n\n1. Make sure there is an available version of python specified in `pyproject.toml`, for example installed using pyenv.\n2. Install `uv` : `curl -LsSf https://astral.sh/uv/install.sh | sh`\n3. Run `uv sync --all-groups` to create a local environment with project dependencies specified in `uv.lock`\n4. Add a spacy language model `uv run spacy download en_core_web_trf`\n5. Set up `pre-commit` hooks:  `uv run pre-commit install`.\n6. To run `pre-commit` independently from `git commit`, run `uv run pre-commit run --all-files`\n7. To run tests run `pytest test`\n\n\nNB.\n1. To run python scripts prefix the command with `uv run`, e.g. `uv run python script.py`\n2. To git commit also `uv run` prefix, e.g. `uv run git commit -m \"first commit\"` to make sure `pre-commit` hooks are used from the correct python environement. \n\n\n## Data Preparation\n- `run/preprocessing/extract_properties_ro`\n- `run/preprocessing/extract_properties_go`\n\n### Merge properties/relations into a table\n\nUniformize and trim data incoming from different sources\n \n- `run/preprocessing/merge_properties`\n\n## Testing against ground truth\n\nGround truth dataset is stored in `data/ground_truth`, so run the following to obtain the accuracy of the model in `./reports` \n\n```commandline\npython run/testing/run_pel_test.py --text-path ./data/ground_truth/sample.0.gt.json --model-type biobert-stsb --layers-spec sent --extra-context\n```\n\n## Serialize Model\n\n\"Train\" a model on a corpus\n\n\n- `uv run python run/save_model.py`\n\n### Run server\n\n- `poetry run python run/serve`\n\n## Container\n1. Build image: `docker buildx build -t gg/pelinker:\u003ccurrent_version\u003e --ssh default=$SSH_AUTH_SOCK . 2\u003e\u00261 | tee build.log`\n2. Run container: `docker run --name pelinker --env THR_SCORE=0.5 gg/pelinker:latest`\n\n\n\n### Algo flow\n\n```mermaid\nflowchart TD\n    A[\"[text]\"] --\u003e|\"split_text_into_batches\"| B[\"[[batched text]]\"]\n    subgraph S1[\"elementary tensors and word bounds\"]\n        direction LR\n        C ~~~ D\n    end\n\n    B --\u003e|\"get_word_boundaries\"| C[\"[[word group bounds]]\"]\n    B --\u003e|\"process_text\"| D[\"token tensor\"]\n    subgraph S2[\"tensors ~ words of interest\"]\n        direction LR\n        E[\"ll_tt_stacked\"] ~~~ F[\"mapping_table\"]\n    end\n\n    S1 --\u003e|\"render_elementary_tensor_table\"| S2\n```\n\n[![](https://mermaid.ink/img/pako:eNqNUj1vwyAU_CuIORnS0UOlJmmndmk6NY4QNs82CgYEz0qjJP7tfdjOx9ChTPi4O-6eOfHSKeAZr4w7lI0MyL7WuWW0XrY53yL84C7nOzafP59zHr3RKBIotEUnCollAzHnZ7ZM9O0IKDboknD0il1RB-kbtlkQDQy0YFGGI_FsdCEyaRU7uKBY4Tqr4k2YltIBStTOsvfPO7pifd-zKStYldtxu5yi1oAiOYrBUQY9xlwNMYer6uA6P134kPVq4IMrIcahbVKuSYluD3YK_Ue5p0SZGvVDn8hcxWhUECDiP1q9koMxAlFElOUeVJp9KvpGB630XttaoCwM3Mweum8WU_ZAIARxH7QYY12lZ8rKZ7yF0Eqt6PefkgGFb0iQ84y2CirZGcqc2wtRZYduc7QlzzB0MOOdVxJhrSVVb3lWSRMJ9dJ-O3f_BqXRhY_xiQ0vbcZp6nUzMS6_rEDWAA?type=png)](https://mermaid.live/edit#pako:eNqNUj1vwyAU_CuIORnS0UOlJmmndmk6NY4QNs82CgYEz0qjJP7tfdjOx9ChTPi4O-6eOfHSKeAZr4w7lI0MyL7WuWW0XrY53yL84C7nOzafP59zHr3RKBIotEUnCollAzHnZ7ZM9O0IKDboknD0il1RB-kbtlkQDQy0YFGGI_FsdCEyaRU7uKBY4Tqr4k2YltIBStTOsvfPO7pifd-zKStYldtxu5yi1oAiOYrBUQY9xlwNMYer6uA6P134kPVq4IMrIcahbVKuSYluD3YK_Ue5p0SZGvVDn8hcxWhUECDiP1q9koMxAlFElOUeVJp9KvpGB630XttaoCwM3Mweum8WU_ZAIARxH7QYY12lZ8rKZ7yF0Eqt6PefkgGFb0iQ84y2CirZGcqc2wtRZYduc7QlzzB0MOOdVxJhrSVVb3lWSRMJ9dJ-O3f_BqXRhY_xiQ0vbcZp6nUzMS6_rEDWAA)\n\n\n## Analysis\n\nAn essential part of analysis is to identify patterns in text and study their embeddings vectors.\n\nTo run pattern matching over different models and patterns, and plot them to `figs` folder, where the texts are taken from a csv file with a column named `abstract`:\n```shell\ncd run\n./test.pat.align.sh ./test.pat.align.sh --pattern pat_a --pattern pat_b --plot-path figs --input-path data/test/sample.csv.gz\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrowgraph%2Fpelinker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrowgraph%2Fpelinker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrowgraph%2Fpelinker/lists"}