{"id":28861951,"url":"https://github.com/vansh-py04/data-extraction-and-text-analysis","last_synced_at":"2026-04-24T22:34:38.661Z","repository":{"id":221996068,"uuid":"755977644","full_name":"vansh-py04/Data-Extraction-and-Text-Analysis","owner":"vansh-py04","description":"The objective of this assignment is to extract textual data articles from the given URL and perform text analysis to compute variables that are explained","archived":false,"fork":false,"pushed_at":"2025-06-19T08:11:31.000Z","size":110,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-19T09:26:18.076Z","etag":null,"topics":["data-analysis","data-extraction","data-science","nlp","nlp-machine-learning","python","textanalysis","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vansh-py04.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-02-11T16:31:42.000Z","updated_at":"2025-06-19T08:11:35.000Z","dependencies_parsed_at":"2025-06-19T09:27:44.920Z","dependency_job_id":"48c0b087-4fc9-4697-b7b6-e00683ebec71","html_url":"https://github.com/vansh-py04/Data-Extraction-and-Text-Analysis","commit_stats":null,"previous_names":["vansh-py04/blackcoffer-data-extraction-and-text-analysis-","vansh-py04/data-extraction-and-text-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vansh-py04/Data-Extraction-and-Text-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vansh-py04%2FData-Extraction-and-Text-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vansh-py04%2FData-Extraction-and-Text-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vansh-py04%2FData-Extraction-and-Text-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vansh-py04%2FData-Extraction-and-Text-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vansh-py04","download_url":"https://codeload.github.com/vansh-py04/Data-Extraction-and-Text-Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vansh-py04%2FData-Extraction-and-Text-Analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260891184,"owners_count":23077915,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-extraction","data-science","nlp","nlp-machine-learning","python","textanalysis","webscraping"],"created_at":"2025-06-20T06:08:29.999Z","updated_at":"2026-04-24T22:34:38.655Z","avatar_url":"https://github.com/vansh-py04.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Extraction and NLP Text Analysis\n\nI will first extract textual data articles from the provided URL and save the extracted article text in separate text files, with the URL_ID as the file name. During extraction, I'll ensure to include only the article title and text, excluding any website header, footer, or other irrelevant content.\n\nOnce the extraction is complete, I'll proceed with the text analysis as per the variables defined in the \"Text Analysis.docx\" file. I'll compute each variable for every extracted article text and organize the results according to the structure specified in the \"Output Data Structure.xlsx\" file.\n\nMy process will involve thorough text processing techniques, including tokenization, sentence segmentation, part-of-speech tagging, named entity recognition, and sentiment analysis, among others, to accurately compute the required variables.\n\nAfter analyzing each article text and computing the variables, I'll save the output in the specified format, ensuring that the variables are presented in the exact order as specified in the output structure file.\n\nThroughout the process, I'll maintain accuracy, efficiency, and adherence to the provided instructions to produce the desired output.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvansh-py04%2Fdata-extraction-and-text-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvansh-py04%2Fdata-extraction-and-text-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvansh-py04%2Fdata-extraction-and-text-analysis/lists"}