{"id":39250658,"url":"https://github.com/sccn/nemar-citations","last_synced_at":"2026-01-18T00:02:56.246Z","repository":{"id":311676703,"uuid":"1036901860","full_name":"sccn/nemar-citations","owner":"sccn","description":"Insights on how NEMAR datasets are being used in the academic context","archived":false,"fork":false,"pushed_at":"2026-01-15T02:03:06.000Z","size":126208,"stargazers_count":2,"open_issues_count":4,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-15T08:00:17.680Z","etag":null,"topics":["bids","fair-data","open-data"],"latest_commit_sha":null,"homepage":"https://neuromechanist.github.io/dataset_citations_dashboard.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sccn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"citations/citations_011024.csv","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-12T18:50:13.000Z","updated_at":"2026-01-15T02:03:12.000Z","dependencies_parsed_at":"2025-08-25T23:46:40.742Z","dependency_job_id":null,"html_url":"https://github.com/sccn/nemar-citations","commit_stats":null,"previous_names":["sccn/nemar-citations"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/sccn/nemar-citations","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sccn%2Fnemar-citations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sccn%2Fnemar-citations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sccn%2Fnemar-citations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sccn%2Fnemar-citations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sccn","download_url":"https://codeload.github.com/sccn/nemar-citations/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sccn%2Fnemar-citations/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28523047,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T23:53:28.710Z","status":"ssl_error","status_checked_at":"2026-01-17T23:52:20.131Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bids","fair-data","open-data"],"created_at":"2026-01-18T00:02:55.389Z","updated_at":"2026-01-18T00:02:56.207Z","avatar_url":"https://github.com/sccn.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NEMAR Citations\n\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)\n[![Tests](https://github.com/sccn/nemar-citations/actions/workflows/test.yml/badge.svg)](https://github.com/sccn/nemar-citations/actions)\n\nAutomated BIDS dataset citation tracking system with AI-powered confidence scoring for 300+ neuroscience datasets.\n\n## Overview\n\nTrack and analyze citations for OpenNeuro datasets with a complete pipeline from discovery to interactive dashboards. Features Google Scholar integration, semantic similarity scoring, network analysis, and automated monthly updates via GitHub Actions.\n\n**Key Features**: Dataset discovery • Citation tracking • AI confidence scoring • Network analysis • Interactive dashboards • JSON/CSV export • GitHub Actions automation\n\n## Installation\n\n```bash\ngit clone https://github.com/sccn/nemar-citations.git\ncd nemar-citations\npip install -e \".[dev,test]\"\n```\n\n**Requirements**: Python 3.11+ • ScraperAPI key • GitHub token (optional)\n\n## Quick Start\n\n```bash\n# 1. Setup environment (choose .env or .secrets)\n# Option A: Using .env file\necho \"SCRAPERAPI_KEY=your_key_here\" \u003e .env\necho \"GITHUB_TOKEN=your_token_here\" \u003e\u003e .env\n\n# Option B: Using .secrets file (auto-loaded by workflow script)\necho \"SCRAPERAPI_KEY=your_key_here\" \u003e .secrets\necho \"GITHUB_TOKEN=your_token_here\" \u003e\u003e .secrets\n\n# 2. Run complete pipeline\nchmod +x run_end_to_end_workflow.sh\n./run_end_to_end_workflow.sh test              # Test mode (no API calls)\n./run_end_to_end_workflow.sh full              # Full pipeline (recommended)\n./run_end_to_end_workflow.sh local-ci-test     # Test CI/CD test workflow locally\n./run_end_to_end_workflow.sh local-ci-update   # Test CI/CD update workflow locally\n```\n\n## Shell Scripts\n\nThe repository includes several shell scripts for different workflows:\n\n| Script | Purpose | Runtime | When to Use |\n|--------|---------|---------|-------------|\n| `run_end_to_end_workflow.sh` | Complete pipeline from discovery to dashboard | 1-3 hours | Production updates, full analysis |\n| `run_full_analysis.sh` | Analysis and dashboard generation only | 10-30 min | When citations already exist |\n| `migrate_to_json.sh` | Convert pickle files to JSON format | 1-2 min | One-time migration |\n\n## Pipeline Workflow\n\n### Running the Complete Pipeline\n\nThe `run_end_to_end_workflow.sh` script automates the entire workflow:\n\n| Mode | Description | Runtime | API Calls | Steps Executed | Branch/PR |\n|------|-------------|---------|-----------|----------------|-----------|\n| `test` | Controlled test data (3-8 citations) | ~1 min | None | 4-5 only (Analyze, Generate) | No |\n| `full` | **Recommended**: Direct pipeline execution | 1-3 hours | Google Scholar, GitHub | 1-5 (All steps) | Yes (auto) |\n| `local-ci-test` | Test GitHub Actions test workflow via Docker | ~5-10 min | None | Runs test suite | No |\n| `local-ci-update` | Test GitHub Actions update workflow via Docker | ~10-30 min | Real API calls | 1-5 (All steps) | Yes (auto) |\n\n**Workflow Steps**:\n1. **Discover** → Find BIDS datasets (EEG/MEG/iEEG)\n2. **Collect** → Fetch citations from Google Scholar\n3. **Enhance** → Add metadata \u0026 AI confidence scores\n4. **Analyze** → Network, temporal, theme analysis\n5. **Generate** → Interactive HTML dashboard\n\n**Mode Selection Guide**:\n- Use `test` for quick validation during development\n- Use `full` for actual citation updates (runs natively, faster)\n- Use `local-ci-test` to test/debug GitHub Actions test workflow issues\n- Use `local-ci-update` to test/debug GitHub Actions update workflow issues\n\n**Branch Protection**: Both `full` and `local-ci-update` modes automatically create a feature branch and pull request to protect the main branch from direct commits.\n\n### Automated Updates (Cron)\n\n```bash\n# 1. Create update script\ncat \u003e ~/update_citations.sh \u003c\u003c 'EOF'\n#!/bin/bash\ncd /path/to/dataset_citations\nsource ~/miniconda3/etc/profile.d/conda.sh\nconda activate dataset-citations\n./run_end_to_end_workflow.sh full\nEOF\nchmod +x ~/update_citations.sh\n\n# 2. Add to crontab (choose one)\ncrontab -e\n0 2 1 * * ~/update_citations.sh \u003e\u003e ~/citations.log 2\u003e\u00261  # Monthly\n0 3 * * 0 ~/update_citations.sh \u003e\u003e ~/citations.log 2\u003e\u00261  # Weekly\n0 4 * * * ~/update_citations.sh \u003e\u003e ~/citations.log 2\u003e\u00261  # Daily\n\n# 3. Monitor\ntail -f ~/citations.log\n```\n\n## Python API\n\n```python\nfrom dataset_citations.core import citation_utils\nfrom dataset_citations.quality.confidence_scoring import CitationConfidenceScorer\nfrom dataset_citations.quality.dataset_metadata import DatasetMetadataRetriever\n\n# Convert pickle to JSON\njson_path = citation_utils.migrate_pickle_to_json(\n    'citations/pickle/ds002718.pkl', \n    'citations/json', \n    'ds002718'\n)\n\n# Load citation data\ncitations = citation_utils.load_citation_json(json_path)\nprint(f\"Dataset {citations['dataset_id']} has {citations['num_citations']} citations\")\n\n# Calculate confidence scores\nscorer = CitationConfidenceScorer()\nconfidence_scores = scorer.score_citations_for_dataset('ds002718', citations, dataset_metadata)\n\n# Retrieve dataset metadata\nretriever = DatasetMetadataRetriever()\nmetadata = retriever.get_dataset_metadata('ds002718')\n```\n\n\n## Key Commands\n\n```bash\n# Discovery \u0026 Updates\ndataset-citations-discover                    # Find datasets\ndataset-citations-update                      # Fetch citations\ndataset-citations-migrate                     # Pickle→JSON\n\n# Quality \u0026 Analysis\ndataset-citations-retrieve-metadata           # Get GitHub data\ndataset-citations-score-confidence            # AI scoring\ndataset-citations-analyze-temporal            # Trends\ndataset-citations-analyze-networks            # Networks\n\n# Dashboards\ndataset-citations-create-interactive-reports  # Generate HTML\n\n# All commands support --help for detailed usage\n```\n\n## Data Formats\n\n### JSON Output\n```json\n{\n  \"dataset_id\": \"ds002718\",\n  \"num_citations\": 13,\n  \"citation_details\": [{\n    \"title\": \"Paper title\",\n    \"author\": \"Authors\",\n    \"year\": 2021,\n    \"confidence_score\": 0.82  // AI similarity score\n  }]\n}\n```\n\n### Confidence Scoring\nAI-powered relevance scoring (0.0-1.0) using sentence transformers to compare dataset metadata with citation abstracts. Helps filter high-confidence citations and identify misattributions.\n\n## Development\n\n```bash\n# Setup\ngit clone https://github.com/sccn/nemar-citations.git\ncd nemar-citations\nconda create -n dataset-citations python=3.11\nconda activate dataset-citations\npip install -e \".[dev,test]\"\n\n# Testing\npytest tests/ -v                    # Fast tests\npytest --cov=dataset_citations      # With coverage\n\n# Code quality\nblack src/ tests/                   # Format\nruff check --fix src/ tests/        # Lint\n```\n\n## Architecture\n\n**Core Components**:\n- **Discovery**: Find BIDS datasets via GitHub API\n- **Collection**: Google Scholar citation fetching with proxy rotation\n- **Processing**: Parallel processing, format conversion, validation\n- **Analysis**: Network graphs, temporal trends, theme clustering\n- **Dashboard**: Interactive HTML with D3.js visualizations\n\n**Data Flow**: Discovery → Fetching → Processing → Analysis → Dashboard\n\n\n## Troubleshooting\n\n| Issue | Solution |\n|-------|----------|\n| ScraperAPI key not found | Add `SCRAPERAPI_KEY` to `.env` |\n| Google Scholar rate limit | Wait for proxy rotation |\n| GitHub API rate limit | Add `GITHUB_TOKEN` to `.env` |\n| MPS memory error (macOS) | Use `--device cpu` |\n| Import errors | Reinstall: `pip install -e \".[dev,test]\"` |\n\n**Debug**: Add `--verbose` flag to any command\n\n**Support**: [GitHub Issues](https://github.com/sccn/nemar-citations/issues)\n\n## Contributing\n\n1. Fork \u0026 create feature branch\n2. Make changes with tests\n3. Run `pytest` and `black`\n4. Submit PR with issue reference\n\n**Guidelines**: Type hints • Docstrings • Tests • No mocks\n\n## License\n\n[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) - Attribution, NonCommercial, ShareAlike\n\n## Citation\n\nIf you use this software in your research, please cite:\n\n```bibtex\n@software{shirazi2025nemarcitations,\n  title={NEMAR Citations: Automated BIDS Dataset Citation Tracking System},\n  author={Shirazi, Seyed Yahya},\n  year={2025},\n  url={https://github.com/sccn/nemar-citations},\n  organization={Swartz Center for Computational Neuroscience (SCCN)}\n}\n```\n\n## Acknowledgments\n\n- **Author**: [Seyed Yahya Shirazi](https://github.com/neuromechanist)\n- **Organization**: [Swartz Center for Computational Neuroscience (SCCN)](https://sccn.ucsd.edu/)\n- **Project**: [NEMAR - NeuroElectroMagnetic Archive](https://nemar.org/)\n- **GitHub**: [@neuromechanist](https://github.com/neuromechanist)\n\nBuilt with ❤️ for NEMAR and the neuroscience open science community.\n\n---\n\n*Last updated: September 19, 2025*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsccn%2Fnemar-citations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsccn%2Fnemar-citations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsccn%2Fnemar-citations/lists"}