{"id":29598801,"url":"https://github.com/zapabob/multi-target-pic50-predictor","last_synced_at":"2026-05-29T01:32:00.693Z","repository":{"id":262422083,"uuid":"887186288","full_name":"zapabob/multi-target-pIC50-predictor","owner":"zapabob","description":"Advanced multi-target pIC50 prediction platform for psychoactive and drug-like compounds. Supports DAT, 5HT2A, CB1, CB2, and opioid receptors. Features Transformer regression, RDKit descriptors, SMARTS scaffolds, Optuna optimization, robust session recovery, and both GUI/CLI interfaces. Designed for research, drug discovery, and cheminformatics.","archived":false,"fork":false,"pushed_at":"2026-05-26T08:27:38.000Z","size":9098,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-26T10:21:55.031Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zapabob.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-11-12T10:11:09.000Z","updated_at":"2026-05-26T08:24:28.000Z","dependencies_parsed_at":"2024-11-12T11:20:56.518Z","dependency_job_id":"5641fefb-6595-4a46-b567-a3b5944d9515","html_url":"https://github.com/zapabob/multi-target-pIC50-predictor","commit_stats":null,"previous_names":["zapabob/hdatpic50prediction"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zapabob/multi-target-pIC50-predictor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zapabob%2Fmulti-target-pIC50-predictor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zapabob%2Fmulti-target-pIC50-predictor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zapabob%2Fmulti-target-pIC50-predictor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zapabob%2Fmulti-target-pIC50-predictor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zapabob","download_url":"https://codeload.github.com/zapabob/multi-target-pIC50-predictor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zapabob%2Fmulti-target-pIC50-predictor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33633468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-20T10:37:08.930Z","updated_at":"2026-05-29T01:32:00.663Z","avatar_url":"https://github.com/zapabob.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# hDAT pIC50 Prediction System\n\n---\n\n## 概要\n本リポジトリは、7つの分子ターゲット（DAT, 5HT2A, CB1, CB2, μ/δ/κ-opioid）に対応した、分子構造（SMILES）からpIC50値をTransformerベースの深層学習モデルで予測するシステムです。\n\n- **分子特徴量**: RDKit記述子、ECFP4/MACCSフィンガープリント、サイケデリックスSMARTS特徴量、アゴニストスキャフォールド\n- **モデル**: PyTorch LightningによるTransformer回帰\n- **CLI/GUI**: Typer CLIとPySide6 GUI両対応\n- **電源断保護**: 自動保存・キャッシュ・チェックポイント\n- **RTX3080等CUDA対応**\n\n---\n\n## 背景・目的\n\n創薬・化学分野では、分子の生物活性（pIC50等）予測は新規化合物設計・スクリーニングの要。特に多ターゲット（DAT, 5HT2A, CB1, CB2, オピオイド）に対し、\n- **分子記述子の多様性**\n- **深層学習による表現学習**\n- **ターゲットごとの代表スキャフォールド考慮**\nを組み合わせることで、従来法より高精度な予測を目指します。\n\n---\n\n## 特徴量設計\n\n- **RDKit分子記述子**: MolWt, LogP, TPSA, NumHDonors, NumHAcceptors, RotatableBonds, AromaticRings, FractionCSP3, LabuteASA, BalabanJ, BertzCT, HeavyAtomCount, など\n- **フィンガープリント**: ECFP4 (1024bit), MACCS (167bit)\n- **サイケデリックスSMARTS特徴量**: インドール環, トリプタミン, フェネチルアミン, メトキシ基数, ハロゲン数, N,N-ジメチルアミン基\n- **アゴニストスキャフォールド**: ターゲットごとに代表的なSMARTSパターンを付与\n- **相関除去**: 高相関特徴量は自動除去\n\n---\n\n## モデル構造\n\n- **入力層**: 分子特徴量ベクトル\n- **埋め込み層**: Linear(input_dim→256)\n- **Transformer Encoder**: 2層, 4ヘッド, 256次元（Optunaで最適化可）\n- **グローバルプーリング**: 平均\n- **出力層**: Linear(256→1)\n- **損失関数**: MSELoss\n- **最適化**: Adam, 学習率スケジューラ, 早期停止\n- **クロスバリデーション/Optuna最適化対応**\n\n---\n\n## ディレクトリ構成\n```\n├── src/                # メインモジュール（train.py, predict.py など）\n├── models/             # 学習済みモデル\n├── tests/              # テストコード\n├── _docs/              # 実装ログ・要件定義\n├── cli.py              # CLIエントリポイント\n├── main.py             # GUIエントリポイント\n├── dat_predictor.py    # コアロジック\n├── requirements.txt    # 依存パッケージ\n└── README.md           # 本ファイル\n```\n\n---\n\n## 依存パッケージ\n- Python 3.10+\n- torch==2.3\n- pytorch-lightning==2.0\n- rdkit==2024.03\n- optuna==3.6\n- PySide6==6.5\n- tqdm\n- seaborn, matplotlib, pandas, scikit-learn\n\n`pip install -r requirements.txt` で一括インストール可能。\n\n---\n\n## 使い方\n\n### 1. CLI\n#### モデル学習\n```sh\npy -3 cli.py train --target CHEMBL238 --output models/dat_transformer_model.pt\n```\n- `--target` : ChEMBLターゲットID（例: CHEMBL238=DAT, CHEMBL224=5HT2A, ...）\n- `--optimize` : Optunaによるハイパーパラメータ最適化\n\n#### 予測\n```sh\npy -3 cli.py predict --model models/dat_transformer_model.pt --smiles \"CC(CC1=CC=CC=C1)NC\"\n```\n- `--input` : SMILESファイル（1行1分子）も可\n\n### 2. GUI\n```sh\npy -3 main.py\n```\n- 学習・予測・バッチ予測・特徴量重要度グラフ・分布可視化など\n\n---\n\n## FAQ・トラブルシュート\n\n- **Q. CUDAが使われない/遅い**\n  - A. torch, pytorch-lightning, CUDA toolkit, GPUドライバのバージョンを確認。\n- **Q. RDKit記述子エラー**\n  - A. RDKitのバージョンとimport名を確認。\n- **Q. ChEMBLからデータが取得できない**\n  - A. chembl_webresource_clientのAPI制限やネットワークを確認。\n- **Q. GUIが起動しない**\n  - A. PySide6のバージョン、PyQt6との競合を確認。\n\n---\n\n## 開発指針・拡張例\n- 新規ターゲット追加：`REFERENCE_COMPOUNDS`/SMARTS/ChEMBL IDを追加\n- 特徴量追加：`MolecularDescriptorCalculator`に記述子関数を追加\n- モデル改良：アンサンブル/多層化/Attention可視化など\n- テスト追加：`tests/`配下にpytestでユニットテスト\n- 実装ログ：`_docs/`に日付+機能名で記録\n\n---\n\n## 参考文献・リンク\n- [ChEMBL](https://www.ebi.ac.uk/chembl/)\n- [RDKit](https://www.rdkit.org/)\n- [PyTorch](https://pytorch.org/)\n- [Optuna](https://optuna.org/)\n- [PySide6](https://doc.qt.io/qtforpython/)\n\n---\n\n## ライセンス\nMIT License\n\n---\n\n## 貢献\n- Issue/PR歓迎。新規ターゲット・特徴量・モデル改良・バグ修正など大歓迎。\n- コーディング規約: PEP8, 型ヒント, 実装ログ必須\n\n---\n\n## 更新履歴\n- 2024-06-09: リポジトリ整理・README刷新\n- 2024-06-01: GUI機能拡張・Optuna最適化追加\n- 2024-05-20: 多ターゲット対応・SMARTS特徴量追加\n- ...\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzapabob%2Fmulti-target-pic50-predictor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzapabob%2Fmulti-target-pic50-predictor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzapabob%2Fmulti-target-pic50-predictor/lists"}