{"id":18499514,"url":"https://github.com/morteza/cogtext","last_synced_at":"2025-05-14T05:35:43.561Z","repository":{"id":44872988,"uuid":"491080135","full_name":"morteza/CogText","owner":"morteza","description":"Linking Theories and Methods in Cognitive Sciences via Joint Embedding of the Scientific Literature: The Example of Cognitive Control (2021)","archived":false,"fork":false,"pushed_at":"2023-11-06T11:00:14.000Z","size":107287,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-17T00:49:43.251Z","etag":null,"topics":["graph-embedding","knowledge-graph","nlp"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2203.11016","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/morteza.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-11T11:25:34.000Z","updated_at":"2025-01-01T21:41:57.000Z","dependencies_parsed_at":"2024-11-06T13:58:40.673Z","dependency_job_id":null,"html_url":"https://github.com/morteza/CogText","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morteza%2FCogText","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morteza%2FCogText/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morteza%2FCogText/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morteza%2FCogText/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/morteza","download_url":"https://codeload.github.com/morteza/CogText/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254077099,"owners_count":22010663,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph-embedding","knowledge-graph","nlp"],"created_at":"2024-11-06T13:46:21.784Z","updated_at":"2025-05-14T05:35:43.535Z","avatar_url":"https://github.com/morteza.png","language":"Jupyter Notebook","readme":"# Linking Theories and Methods of Cognitive Control\n\n\u003e This is the official repository for the paper [Linking Theories and Methods in Cognitive Sciences via Joint Embedding of the Scientific Literature: The Example of Cognitive Control](https://arxiv.org/abs/2203.11016).\n\nWe performed automated text analyses on a large body of scientific texts (385705 scientific abstracts) and created a joint representation of cognitive control tasks and constructs.\n\nAbstracts were first mapped into an embedding space using GPT-3 and Top2Vec models. Document embeddings were then used to identify a task-construct graph embedding that grounds constructs on tasks and supports nuanced meaning of the constructs by taking advantage of constrained random walks in the graph.\n\n\n## Setup\n\nWe recommend [Conda/Mamba](https://mamba.readthedocs.io/en/latest/) and [DVC](https://dvc.org) to set up a clean environment and download the data. You can create and activate the `cogtext` environment and automatically download the required data from [CogText dataset on HuggingFace](https://huggingface.co/datasets/morteza/cogtext) by running:\n\n\n```bash\nmamba env create --file environment.yml  # or use `conda`\nmamba activate cogtext                   # activate the environment\ndvc pull                                 # download the data\n```\n\n## Notebooks\n\nThe main entry point of the project is the `notebooks/` folder.\n\nNote that Jupyter notebooks contain relative paths and are supposed to be run from the root of the project.\n\n\n- **[1 Data Collection (2023)](notebooks/1%20Data%20Collection%20(2023).ipynb)** uses the [EFO ontology](https://huggingface.co/datasets/morteza/cogtext/blob/main/ontologies/efo.owl) to search PubMed, aggregates abstracts as a single dataset, and stores the results in a compressed CSV file. If you already downloaded the [CogText dataset](https://huggingface.co/datasets/morteza/cogtext/blob/main/pubmed/abstracts_2023.csv.gz), you can skip this step. Simply copy your downloaded file to `data/pubmed/abstracts_2023.csv.gz`.\n\n- **[2 Descriptive Statistics](notebooks/2%20Descriptive%20Statistics.ipynb)** computes some basic statistics such as the number of tasks and constructs, co-occurrences, articles per each task or construct, etc. This notebook requires the `data/pubmed/abstracts_2023.csv.gz` file.\n\n- **[3 Document Embedding](notebooks/3%20Document%20Embedding.ipynb)** uses GPT-3 Embedding API (Ada) to transform the raw abstracts to vectorized embeddings.\n\n- **[4 Topic Embedding](notebooks/4%20Topic%20Embedding.ipynb)** projects embeddings into a more interpretable topic space. The topic embedding uses UMAP and HDBSCAN to calculate the topic weights (as in Top2Vec).\n\n- **[5 Hypernomy](notebooks/5%20Hypernomy.ipynb)** visualizes *construct hypernomy*: inconsistent definitions of cognitive constructs across cognitive fields.\n\n- **[6 Hypergraph Visualization](notebooks/6%20Hypergraph%20Visualization.ipynb)** plots the task-construct hypergraph.\n\n- **[7 Link Prediction](notebooks/7%20Link%20Prediction.ipynb)** predicts the edges of the task-constructs hypergraph and learns Metapath2vec embedding of the graph nodes.\n\n\n# Acknowledgements\n\nThis research was supported by the Luxembourg National Research Fund (ATTRACT/2016/ID/11242114/DIGILEARN\nand INTER Mobility/2017-2/ID/11765868/ULALA).\n\n# Citation\n\nTo cite the paper use the following entry:\n\n```\n@misc{cogtext2022,\n  author = {Morteza Ansarinia and\n            Paul Schrater and\n            Pedro Cardoso-Leite},\n  title = {Linking Theories and Methods in Cognitive Sciences via Joint Embedding of the Scientific Literature: The Example of Cognitive Control},\n  year = {2022},\n  url = {https://arxiv.org/abs/2203.11016}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmorteza%2Fcogtext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmorteza%2Fcogtext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmorteza%2Fcogtext/lists"}