{"id":18734461,"url":"https://github.com/rohra-mehak/sciencesync","last_synced_at":"2026-05-05T14:32:30.892Z","repository":{"id":243560731,"uuid":"805751424","full_name":"rohra-mehak/ScienceSync","owner":"rohra-mehak","description":"System for Personalized Google Scholar Alerts Processing and Data Management, and provision of ML based  clustering analysis","archived":false,"fork":false,"pushed_at":"2024-06-19T17:48:40.000Z","size":10331,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-21T12:37:27.508Z","etag":null,"topics":["agglomerative-clustering","clustering","crossref-api","customtkinter","google-api","google-scholar","graph-api","machine-learning","numpy","pandas","python3","scientific-article-analysis","scikit-learn","sqlite3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rohra-mehak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-05-25T11:04:29.000Z","updated_at":"2024-06-19T17:48:42.000Z","dependencies_parsed_at":"2024-06-20T05:30:15.850Z","dependency_job_id":"dcc57bc4-7269-4644-bd5e-46e33e7b7779","html_url":"https://github.com/rohra-mehak/ScienceSync","commit_stats":null,"previous_names":["rohra-mehak/sciencesync"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rohra-mehak/ScienceSync","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohra-mehak%2FScienceSync","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohra-mehak%2FScienceSync/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohra-mehak%2FScienceSync/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohra-mehak%2FScienceSync/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rohra-mehak","download_url":"https://codeload.github.com/rohra-mehak/ScienceSync/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohra-mehak%2FScienceSync/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32653535,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-05T11:29:49.557Z","status":"ssl_error","status_checked_at":"2026-05-05T11:29:48.587Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agglomerative-clustering","clustering","crossref-api","customtkinter","google-api","google-scholar","graph-api","machine-learning","numpy","pandas","python3","scientific-article-analysis","scikit-learn","sqlite3"],"created_at":"2024-11-07T15:13:27.396Z","updated_at":"2026-05-05T14:32:30.877Z","avatar_url":"https://github.com/rohra-mehak.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Science Sync\nThis system is designed to enhance the data management, access and analysis of Google Scholar Alerts sent to emails by integrating several automated processes.\nThe core functionalities are described below.\n\n## Overview \n\n#### Email Integration:\n\nGmail and Outlook Connectivity: The system connects to your Gmail or Outlook accounts to retrieve Google Scholar alerts.\n\n #### Alert Retrieval and Parsing\n  Google Scholar alerts are automatically fetched and parsed using \u003ci\u003eBeautiful Soup\u003c/i\u003e and regular expressions:\u003ci\u003e regex\u003c/i\u003e to extract relevant information such as titles, authors, publication dates, and links.\n\n#### Data Storage: \n\nThe parsed information is stored in an in-memory database using `Sqlite3` for easy access and further processing. \n\n\n#### Machine Learning-Based Analysis: \n\n#### Clustering: \nThe system employs clustering algorithms (such as KMeans, KMedoids, and Agglomerative Clustering) to group similar articles. \n\n#### Similarity Metrics: \nJaccard similarity / Euclidean can be used to measure the similarity between different articles based on their references.\n\n#### Interfaces: \nInterfaces built using `customtkinter` facilitate communication with the system.\n\nResults Display: The system provides intuitive visualization tools to display clustering results and other analytical insights.\n\n## Pre-Requisites\n1. Python 3.10 or higher versions\n\n    Official website for download: https://www.python.org/doc/\n\n\n2. pip (for instaling all related dependencies)\n    \n    pip installation guide: https://pip.pypa.io/en/stable/installation/\n\n\n3. your preferred IDE: Visual Studio Code or others.\n    \n    VS Code download page: https://code.visualstudio.com/Download\n   \n    (recommended) Python extension in VS Code Market place: https://code.visualstudio.com/docs/editor/extension-marketplace\n\n\n\n## How To Run\n\n1. ### a. Clone the repository: ([git](https://git-scm.com/downloads) is required)\n\n    ```bash\n    git clone https://github.com/rohra-mehak/ScienceSync.git\n    ```\n    ```bash\n    cd ScienceSync\n    ```\n\n    \n    ### b. Alternatively Download the code:\n\n      Navigate to: https://github.com/rohra-mehak/ScienceSync\n\n      Click the `Code` button. \n\n      Select `Download ZIP`.\n\n      Extract the ZIP file to your desired location.\n\n2. ### Navigate to the root folder directory:\n   ```bash\n   cd yourpath/to/ScienceSync\n   ```\n   \n   On Linux , macOS or Windows\n   Use the `mkdir` command followed by the name of the directory in the terminal of your IDE.\n   \n   ```bash\n   mkdir secrets\n   ```\n   \n   ```bash\n   mkdir database\n   ```\n   \n  \n3. ### Configure a virtual environment\n   In your IDE, make sure you are in the `ScienceSync` directory.\n   go to the terminal window and run the following commands\n\n   Example for VS Code:\n   \n   Create a virtual env directory called `venv` in the root `ScienceSync` directory\n   \n   ```bash\n   1. python -m venv venv\n   ```\n   \n   This Execution Policy command is used in the context of a Windows PowerShell and is not applicable for other OS.\n   ```bash\n   Set-ExecutionPolicy Unrestricted -Scope Process\n   ```\n   Activate the Environment\n\n   Windows \n   ```bash\n   2. .\\venv\\Scripts\\activate\n   ```\n   MacOS / Linux\n   ```bash\n   2. source venv/bin/activate\n   ```\n   \n   Once it is activated, you may see the `(venv)` prefix to your command line path.\n\n4. ### Install all dependencies\n   \n   run the following command and wait for all dependencies to finish\n   installing.\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n5. ### Configure the IDE to use the Virtual Environment\n\n   To ensure your IDE uses the correct Python interpreter from your virtual environment, you generally need to configure the IDE to recognize and use the virtual environment. \n   Here’s a generalized approach for VS code\n\n   ### Visual Studio Code (VS Code)\n   \n   1. *Open Command Palette:*\n      - Press Cmd+Shift+P (macOS) or Ctrl+Shift+P (Windows/Linux) to open the command palette.\n   \n   2. *Select Interpreter:*\n      - Type Python: Select Interpreter -\u003e Enter Interpreter Path -\u003e Find Interpreter.\n   \n   3. *Choose Virtual Environment:*\n      - Select the interpreter located in your virtual environment `(venv)` directory. It will typically look like `./venv/bin/python` or `.\\venv\\Scripts\\python.exe` on Windows.\n  \n\n\n\n6. ### Configuring Credentials (GoogleAPI or GraphAPI)\n\nTo access your email account, you'll need to obtain your own client ID and client secret tokens. Depending on your email service (Outlook or Gmail), follow the appropriate steps below:\n\n#### a. Accessing Outlook (using MS Graph)\n\n1. **Register Your Application:**\n   Follow the process outlined in the Microsoft documentation to register your application and obtain the necessary tokens: [Register an app](https://learn.microsoft.com/en-us/entra/identity-platform/quickstart-register-app)). with Mail.Read , Mail.ReadWrite, User.Read API Permissions.\n\n2. **Save Credentials:**\n   Once you have your application ID and client secret, save them in a file named `credentials_msgraph.json` in the `ScienceSync/secrets` directory. The file should have the following format:\n\n   ```json\n   {\n     \"application_id\": \"your_app_id\",\n     \"client_secret\": \"your_client_secret\"\n   }\n   ```\n\n#### b. Accessing Gmail (using Google API)\n\n1. **Set Up Your Environment:**\n   Follow the steps mentioned in the Google documentation (Set up your Environment Section only) to register your application and obtain the necessary tokens: [Set up your environment](https://developers.google.com/gmail/api/quickstart/python#set_up_your_environment).\n\n2. **Download and Save Credentials:**\n   After registering, download the JSON file containing your credentials. Save this file as `credentials.json` in the `ScienceSync/secrets` directory.\n\nAdditional resources and information on working with Google APIs can be found here: [Getting started with Google APIs](https://developers.google.com/workspace/guides/get-started#5_steps_to_get_started).\n\n---\n\nBy following the above instructions, you will successfully configure your credentials for accessing your email account using either MS Graph or Google API.\n\n\n7. Navigate to `ScienceSync/app.py`\n\n   there are various parameters that can be set before running the program. \n   However it is recommended to leave the default values as they are.\n\n* `days_ago` (no of days to look back while going through the mailbox)\n* `table_name` (the name of the table in your article database which will be created and referred by the system)\n* `n_clusters` (number of groups [for clustering articles together] to divide the articles into)\n* `method` (the clustering methodology -\u003e KMedoids / KMedoids++ /  Agglomerative (average linkage) / Agglomerative (complete linkage))\n* `metric` (the similarity metric to use -\u003e dice / jaccard / sokal and sneath)\n\n\n8. ### Running the main file \n  **After making sure all steps are successfully completed and all dependencies have been installed, \n    Make sure you are in the root Science sync directory.\n    To start the program, run the following command on your terminal**\n\n```bash\n   python app.py\n   ```\n## Sample Snapshots\n\nArrows are simply illustrative indicators.\n\n### Initial Screen\n\nDepending on the service chosen and whether credentials could be located by the program, this part might be different. \n\n![Initial Screen](https://github.com/rohra-mehak/ScienceSync/blob/master/static/media/step1.png?raw=true)\n\n### Redirection to Authorisation\n\n![Redirection Message Screen](https://github.com/rohra-mehak/ScienceSync/blob/master/static/media/step2.png?raw=true)\n\nThe authorisation continues on your browser and this will depend on the service you chose.\nThe initial screen keeps updating the user about progress of the system and errors encountered if any.\n\nLogs can be used to identify any problem encountered. They provide the exact line, method and file where some exception or error occured.\n\nWait for the process to finish executing and for the results interface to load.\n\n### After finishing up the process -\u003e Click on All Data to view an itemised list of all extracted articles \n![Initial Results Screen](https://github.com/rohra-mehak/ScienceSync/blob/master/static/media/all_data_view.png?raw=true)\n\n### Viewing the scrollable itemised list of articles. Click on a single article to view more information\n![Click article Screen](https://github.com/rohra-mehak/ScienceSync/blob/master/static/media/view_article.png?raw=true)\n\n### Article Information on the Right Hand Tab. This includes additional functionalities to Navigate to Article, Save on Google Scholar.\n### Additional Export Options below and also Display settings for UI scaling and Themes \n![Article Information Screen](https://github.com/rohra-mehak/ScienceSync/blob/master/static/media/article%20info.png?raw=true)\n\n### Similarly One can go on to see the article groups and view related articles.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohra-mehak%2Fsciencesync","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frohra-mehak%2Fsciencesync","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohra-mehak%2Fsciencesync/lists"}