{"id":18401048,"url":"https://github.com/elaaatif/nlp-project","last_synced_at":"2025-06-20T10:39:34.326Z","repository":{"id":228466510,"uuid":"770612137","full_name":"elaaatif/NLP-PROJECT","owner":"elaaatif","description":"An NLP Project using BERT model for Tweeter similarity analysis ","archived":false,"fork":false,"pushed_at":"2024-03-18T22:51:47.000Z","size":913,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-12T17:44:44.772Z","etag":null,"topics":["bert-embeddings","bert-model","nlp","nlp-machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elaaatif.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-11T21:07:06.000Z","updated_at":"2025-02-19T20:12:26.000Z","dependencies_parsed_at":"2024-03-18T23:46:01.134Z","dependency_job_id":"0aa46ead-8b82-4d4b-a777-421d99dfacc1","html_url":"https://github.com/elaaatif/NLP-PROJECT","commit_stats":null,"previous_names":["elaaatif/nlp-project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/elaaatif/NLP-PROJECT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elaaatif%2FNLP-PROJECT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elaaatif%2FNLP-PROJECT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elaaatif%2FNLP-PROJECT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elaaatif%2FNLP-PROJECT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elaaatif","download_url":"https://codeload.github.com/elaaatif/NLP-PROJECT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elaaatif%2FNLP-PROJECT/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260927844,"owners_count":23084106,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert-embeddings","bert-model","nlp","nlp-machine-learning"],"created_at":"2024-11-06T02:37:34.997Z","updated_at":"2025-06-20T10:39:29.312Z","avatar_url":"https://github.com/elaaatif.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"﻿# NLP-PROJECT\r\n# Tweet Similarity Analysis with Transformer Embeddings\r\n## Overview\r\nThis project aims to develop a model for analyzing the semantic similarity between pairs of tweets and providing a similarity score indicating the likelihood that they originated from the same user. Leveraging transformer-based architectures for text representation, the model utilizes advanced natural language processing (NLP) techniques to discern subtle semantic nuances present in tweet content.\r\n\r\n## Project Structure\r\nThe project is organized into the following components:\r\n\r\nData Preparation: Random sampling of tweet pairs and labeling them based on user similarity.\r\nData Preprocessing: Cleaning and preprocessing tweet text for model input.\r\nModel Architecture: Utilization of a pre-trained transformer model for sequence classification.\r\n## Evaluation: \r\nAssessment of model performance using standard evaluation metrics.\r\n\r\n## Requirements\r\n+ Python 3.x\r\n+ Pandas\r\n+ NumPy\r\n+ Transformers library (Hugging Face)\r\n+ TensorFlow \r\n+ scikit-learn\r\n+ nltk\r\n+ BeautifulSoup (for HTML tag removal)\r\n  \r\n## Usage\r\n### Data Preparation: \r\nEnsure the availability of tweet data in the required format (e.g., Excel files) and execute the create_tweet_pairs function to generate tweet pairs with appropriate labels.\r\n### Data Preprocessing: \r\nClean and preprocess tweet text using the provided functions for lowercase conversion, punctuation removal, and HTML tag removal.\r\n### Model Training: \r\nTrain the model using the provided script, specifying the desired transformer model and training parameters.\r\n### Evaluation: \r\nEvaluate the trained model using standard evaluation metrics such as accuracy, precision, recall, and F1 score.\r\nModel Deployment\r\nFor deploying the trained model:\r\n\r\nSave the trained model using appropriate serialization techniques (e.g., Trainer.save_model() or joblib.dump()).\r\nConsider deployment strategies such as REST API integration or model serving platforms for real-time inference.\r\n### Additional Notes\r\nExperiment with different transformer architectures and hyperparameters to optimize model performance.\r\nExplore techniques for model optimization and compression to reduce inference latency and resource consumption.\r\nDocument any insights, challenges, and future directions in the project report.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felaaatif%2Fnlp-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felaaatif%2Fnlp-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felaaatif%2Fnlp-project/lists"}