{"id":28301628,"url":"https://github.com/brian-hepler-phd/mathresearchcompass","last_synced_at":"2026-03-08T15:35:29.760Z","repository":{"id":292069344,"uuid":"975775448","full_name":"brian-hepler-phd/MathResearchCompass","owner":"brian-hepler-phd","description":"An interactive dashboard for exploring mathematical research trends on arXiv","archived":false,"fork":false,"pushed_at":"2025-06-10T01:57:38.000Z","size":78829,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-10T02:38:41.676Z","etag":null,"topics":["arxiv","bertopic","dashboard","data-science","data-visualization","knowledge-discovery","llm-applications","machine-learning","mathematics","nlp","plotly","python","research-trends","shiny-python","topic-modeling"],"latest_commit_sha":null,"homepage":"https://bhepler.com","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brian-hepler-phd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-30T22:02:46.000Z","updated_at":"2025-06-10T01:57:43.000Z","dependencies_parsed_at":"2025-05-08T02:24:42.068Z","dependency_job_id":"e92bb29c-94dd-4c27-ba4b-02eac3d7eec2","html_url":"https://github.com/brian-hepler-phd/MathResearchCompass","commit_stats":null,"previous_names":["brian-hepler-phd/mathresearchcompass"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/brian-hepler-phd/MathResearchCompass","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brian-hepler-phd%2FMathResearchCompass","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brian-hepler-phd%2FMathResearchCompass/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brian-hepler-phd%2FMathResearchCompass/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brian-hepler-phd%2FMathResearchCompass/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brian-hepler-phd","download_url":"https://codeload.github.com/brian-hepler-phd/MathResearchCompass/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brian-hepler-phd%2FMathResearchCompass/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260579420,"owners_count":23031172,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arxiv","bertopic","dashboard","data-science","data-visualization","knowledge-discovery","llm-applications","machine-learning","mathematics","nlp","plotly","python","research-trends","shiny-python","topic-modeling"],"created_at":"2025-05-23T20:11:49.367Z","updated_at":"2026-03-08T15:35:29.755Z","avatar_url":"https://github.com/brian-hepler-phd.png","language":"Jupyter Notebook","readme":"# Math Research Compass\n\n![Math Research](https://img.shields.io/badge/Research-Mathematics-blue)\n![Topic Modeling](https://img.shields.io/badge/NLP-Topic%20Modeling-green)\n![Shiny App](https://img.shields.io/badge/App-Shiny-red)\n\n## Overview\n\nMath Research Compass analyzes arXiv preprints to identify trending research topics across mathematical subfields. This interactive dashboard visualizes topic modeling results from over 121,000 recent mathematics papers, helping researchers and students discover emerging areas and popular research directions.\n\nThe application uses advanced natural language processing to cluster semantically related papers and identify coherent research themes. Recent optimizations have improved performance dramatically, reducing loading times from 15-20 seconds to under 5 seconds through database architecture improvements.\n\n**Live Dashboard**: [Math Research Compass](https://brian-hepler-phd.shinyapps.io/mathresearchcompass1/)\n\n## Project Structure\n\n### Core Applications\n- `app_v2.py` - Optimized Shiny dashboard with database integration\n- `optimized_data_manager.py` - High-performance data layer with caching and connection pooling\n- `create_database.py` - Database migration script for converting CSV data to SQLite\n\n### Data Processing Pipeline\n- `topic_trends_analyzer.py` - Performs topic modeling analysis on arXiv papers using BERTopic\n- `topic_labeling.py` - Enhances topic labels using Claude AI for better readability\n- `category_distribution.py` - Analyzes distribution of arXiv categories across topics\n- `combined_network_analysis.py` - Collaboration network analysis (in development)\n\n### Configuration Files\n- `Procfile` - Heroku deployment configuration\n- `requirements.txt` - Minimal production dependencies\n- `runtime.txt` - Python version specification\n\n## Data Processing Workflow\n\n### 1. Data Collection and Filtering\n\nThe project uses data from the [Kaggle ArXiv dataset](https://www.kaggle.com/datasets/Cornell-University/arxiv), containing approximately 2.7 million arXiv papers. We filter this to focus on mathematics papers from 2020-2025, resulting in 121,391 papers across 31 mathematical subfields.\n\nThe dataset includes standard arXiv metadata: paper IDs, titles, abstracts, author lists, publication dates, and category classifications.\n\n### 2. Topic Modeling with BERTopic\n\nThe topic modeling pipeline combines several state-of-the-art techniques:\n\n1. Text preprocessing combines paper titles and abstracts\n2. Sentence-BERT generates semantic embeddings \n3. UMAP reduces dimensionality for efficient clustering\n4. HDBSCAN performs density-based clustering to discover topics\n5. TF-IDF extraction identifies representative keywords\n\nThis process discovered 1,938 distinct topics across the mathematics corpus, with each paper assigned to its most relevant topic.\n\n### 3. AI-Enhanced Topic Labeling\n\nRaw topic keywords are processed through Claude AI to generate human-readable topic descriptions. For example, a topic with keywords like \"homotopy\", \"spectral\", \"cohomology\" becomes \"Algebraic Topology - Homotopy Theory and Spectral Sequences\".\n\n### 4. Database Architecture\n\nThe application migrated from CSV file processing to an optimized SQLite database. Key tables include:\n\n- `topics` - Topic metadata with counts and category classifications\n- `papers` - Paper information with pre-processed author formatting\n- `topic_keywords` - Ranked keywords for each topic\n- `topic_category_distribution` - Category breakdowns within topics\n- `topic_top_authors` - Author rankings by paper count per topic\n\nThis migration reduced initial loading time by 4-5x and optimized memory usage significantly.\n\n### 5. Category Distribution Analysis\n\nEach topic is analyzed to determine its primary mathematical subfield by calculating the frequency of arXiv categories within that topic's papers. This enables filtering and visualization by mathematical area.\n\n## Dashboard Features\n\n### Overview Page\n\nThe main dashboard provides a high-level view of mathematical research topics:\n\n- Summary statistics showing total papers and topics\n- Category filtering across 31 math subfields  \n- Interactive bar chart of top research topics\n- Dynamic content that updates based on selected category\n\n### Topic Explorer\n\nThe explorer page offers detailed analysis of individual topics:\n\n- Topic selection filtered by mathematical category\n- Author rankings showing most prolific contributors\n- Category distribution charts showing topic spread across subfields\n- Representative paper samples with metadata and arXiv links\n\nAll interactions are optimized for sub-second response times through database indexing and intelligent caching.\n\n## Performance Optimizations\n\nThe application implements several performance improvements:\n\n- **Database queries** replace CSV file loading, reducing response times to under 0.1 seconds\n- **LRU caching** stores frequently accessed data in memory\n- **Connection pooling** manages database connections efficiently\n- **Lazy loading** only retrieves data when needed by users\n- **Indexed queries** on frequently filtered columns\n\nThese optimizations support 50+ concurrent users while using less than 1GB of memory.\n\n## Installation and Usage\n\n### Quick Start\n\n```bash\ngit clone https://github.com/brian-hepler-phd/MathResearchCompass.git\ncd MathResearchCompass\npip install -r requirements.txt\npython app_v3.py\n```\n\n### Database Setup\n\nTo recreate the database from raw data:\n\n```bash\npython src/create_database.py\npython src/optimized_data_manager.py  # Test performance\n```\n\n\n## Deployment\n\nThe application currently runs on shinyapps.io with plans to migrate to Heroku for improved performance and reliability. The Heroku deployment will provide:\n\n- Professional hosting with 99.95% uptime\n- SSL certificates and custom domain support\n- Auto-scaling for traffic spikes\n- Continuous deployment from GitHub\n\nMigration files are included (`Procfile`, `runtime.txt`) for straightforward deployment.\n\n## Future Development\n\n### Collaboration Network Analysis\n\nDevelopment is underway for comprehensive author collaboration analysis:\n\n- Network graphs showing research partnerships within topics\n- Temporal analysis of how collaborations evolve over time\n- Author influence metrics and centrality calculations\n- Cross-topic collaboration discovery\n\nThis will analyze collaboration patterns across all 1,938 topics, providing insights into mathematical research communities.\n\n### Additional Planned Features\n\n- Predictive modeling to forecast emerging research areas\n- Citation analysis integration for impact metrics\n- Geographic mapping of research activity\n- Real-time updates as new papers are published\n- API access for programmatic data retrieval\n\n## Technologies\n\nThe application is built with:\n\n- **Python 3.11** with optimized dependencies\n- **Shiny for Python** for the interactive web interface\n- **SQLite** for high-performance data storage\n- **BERTopic** for advanced topic modeling\n- **Sentence-BERT** for semantic text embeddings\n- **UMAP and HDBSCAN** for dimensionality reduction and clustering\n- **Plotly** for interactive visualizations\n- **NetworkX** for upcoming collaboration analysis\n\n## Performance Metrics\n\n| Metric | Before Optimization | After Optimization | Improvement |\n|--------|-------------------|-------------------|-------------|\n| Initial Load Time | 15-20 seconds | 2-5 seconds | 4-5x faster |\n| Memory Usage | Reduced | \u003c1 GB | Optimized |\n| Query Response | N/A | \u003c0.1 seconds | New capability |\n| Concurrent Users | 1-2 | 50+ | 25x increase |\n\n## Research Applications\n\nThe dashboard serves multiple research use cases:\n\n- **Trend Discovery**: Identify emerging areas within mathematical subfields\n- **Literature Review**: Find representative papers and related topics\n- **Collaboration Planning**: Discover active researchers in specific areas\n- **Academic Planning**: Understand research landscapes for students and early-career researchers\n- **Institutional Strategy**: Inform hiring and resource allocation decisions\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgments\n\n- ArXiv for providing access to research paper metadata\n- Kaggle for hosting the ArXiv dataset\n- Anthropic for the Claude API used in topic labeling\n\n## Links\n\n- **Live Dashboard**: [Math Research Compass](https://brian-hepler-phd.shinyapps.io/mathresearchcompass/)\n- **Creator's Website**: [bhepler.com](https://bhepler.com)\n- **GitHub Repository**: [MathResearchCompass](https://github.com/brian-hepler-phd/MathResearchCompass)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrian-hepler-phd%2Fmathresearchcompass","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrian-hepler-phd%2Fmathresearchcompass","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrian-hepler-phd%2Fmathresearchcompass/lists"}