{"id":19262486,"url":"https://github.com/cdeck3r/nlppaperanalysis","last_synced_at":"2026-05-18T07:04:43.206Z","repository":{"id":114500864,"uuid":"268287621","full_name":"cdeck3r/NLPPaperAnalysis","owner":"cdeck3r","description":"Make the evolution of a research area and its topics visible by applying NLP on its publications.","archived":false,"fork":false,"pushed_at":"2020-06-01T08:45:18.000Z","size":2797,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-23T18:47:00.121Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cdeck3r.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-31T13:37:55.000Z","updated_at":"2020-11-09T09:34:44.000Z","dependencies_parsed_at":"2023-05-17T09:15:39.300Z","dependency_job_id":null,"html_url":"https://github.com/cdeck3r/NLPPaperAnalysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cdeck3r/NLPPaperAnalysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeck3r%2FNLPPaperAnalysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeck3r%2FNLPPaperAnalysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeck3r%2FNLPPaperAnalysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeck3r%2FNLPPaperAnalysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cdeck3r","download_url":"https://codeload.github.com/cdeck3r/NLPPaperAnalysis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeck3r%2FNLPPaperAnalysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33168910,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-18T05:43:36.989Z","status":"ssl_error","status_checked_at":"2026-05-18T05:43:19.133Z","response_time":71,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T19:31:55.884Z","updated_at":"2026-05-18T07:04:43.190Z","avatar_url":"https://github.com/cdeck3r.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Quantitative Analysis and Trends of IWSSS Topics\n\nThe [International Workshop on Smart Sensing Systems (IWSSS) 2019](https://iwsss19.github.io/) is the fourth in the series since 2016. This work makes the evolution of the IWSSS research area and its topics visible by applying NLP on its publications.\n\n**Take away:** Provide an repeatable approach to follow-up on future IWSSS occasions or apply the methods to other fields and its conferences' publications.\n\nYou find the full analysis in the notebook [IWSSSAnalysis.ipynb](notebooks/IWSSSAnalysis.ipynb). There is a [blog post](https://cdeck3r.com/2020-05-31-NLPPaperAnalysis/) showing some selected results. \n\n## Research Questions \n\n* How to explore the papers' context to achieve a general understanding?\n* What are strong relations connecting all documents with each other?\n* What are relevant papers to read?\n* What are topics and how do papers correspond to these topics?\n* Topic evolution: How much are past topics still present in IWSSS?\n\n## Contribution\n\nWhy may you find this work interesting?\n\nAs a user you get:\n\n* Important paper reading list\n* Topic distribution over years\n* Papers from dominating topics\n\nAs a data scientist you get:\n\n* Graph visualization and exploration for NLP: wordclouds, bi- and trigrams, word pairs, word correlation analyis \n* Algorithm to select an appropriate correlation coefficient threshold for a pairwise word correlation graph\n* tf-idf, LDA topic modeling use cases \n\nYou do not get:\n\n* the latest NLP stuff on word embeddings and neural networks. Nevertheless, this is an interesting area for future extensions.\n\n## Approach\n\nThis analysis only utilizes titles and abstracts of paper publications. There are good reasons to focus on these both inputs. Firstly, titles and abstracts are available even when the paper is behind a paywall, secondly, they often come in formats easy to scrape and parse, e.g. from a website. PDF file content may get very hard to parse automatically, because of tables, formulas and images. \n\nUsing papers' titles and abstracts only, we are able to create a complete as possible data base for our analysis. We store the data in the MS Excel format to enable an easy way to manually edit this data base.\n\n![Analysis approach](approach.png)\n\n## Quickstart: Run your own Analysis \n\nClone this repository. It becomes the project root.\n\n```bash\ngit clone https://github.com/cdeck3r/NLPPaperAnalysis.git\n```\n\n### Preps\n\nCreate a `.env` file in the project's root specifying global environment variables.\n```\n# In the container, this is the directory where the code is found\nAPP_ROOT=/NLPPaperAnalysis\n\n# the HOST directory containing the project's root.\n# e.g. /home/username/NLPPaperAnalysis\nVOL_DIR=\u003cproject root\u003e\n```\n\n### Container \n\nStart in project's root dir. Create docker image:\n```bash\ndocker-compose build rnlp \n```\n\nSpin-up container\n```bash\ndocker-compose up -d rnlp \n```\n\nPoint your browser to http://localhost:8888\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcdeck3r%2Fnlppaperanalysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcdeck3r%2Fnlppaperanalysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcdeck3r%2Fnlppaperanalysis/lists"}