{"id":46563226,"url":"https://github.com/eddmpython/dartlab","last_synced_at":"2026-04-25T02:12:03.051Z","repository":{"id":342559760,"uuid":"1174537176","full_name":"eddmpython/dartlab","owner":"eddmpython","description":"Turn DART \u0026 EDGAR filings into one structured company map — financials, text, reports aligned across every period. 전자공시 분석 Python 라이브러리","archived":false,"fork":false,"pushed_at":"2026-04-06T00:39:08.000Z","size":186296,"stargazers_count":29,"open_issues_count":1,"forks_count":10,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-06T01:32:17.731Z","etag":null,"topics":["10-k","ai-analysis","annual-reports","dart","dart-api","dartlab","disclosure","edgar","finance","financial-analysis","financial-data","financial-statements","korea","korean-stock","mcp","open-data","polars","python","sec","xbrl"],"latest_commit_sha":null,"homepage":"https://eddmpython.github.io/dartlab","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eddmpython.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"eddmpython","buy_me_a_coffee":"eddmpython"}},"created_at":"2026-03-06T14:58:20.000Z","updated_at":"2026-04-06T00:39:16.000Z","dependencies_parsed_at":"2026-03-11T09:01:32.319Z","dependency_job_id":null,"html_url":"https://github.com/eddmpython/dartlab","commit_stats":null,"previous_names":["eddmpython/dartlab"],"tags_count":87,"template":false,"template_full_name":null,"purl":"pkg:github/eddmpython/dartlab","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eddmpython%2Fdartlab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eddmpython%2Fdartlab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eddmpython%2Fdartlab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eddmpython%2Fdartlab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eddmpython","download_url":"https://codeload.github.com/eddmpython/dartlab/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eddmpython%2Fdartlab/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31581890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"online","status_checked_at":"2026-04-09T02:00:06.848Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["10-k","ai-analysis","annual-reports","dart","dart-api","dartlab","disclosure","edgar","finance","financial-analysis","financial-data","financial-statements","korea","korean-stock","mcp","open-data","polars","python","sec","xbrl"],"created_at":"2026-03-07T06:15:01.509Z","updated_at":"2026-04-13T01:43:12.485Z","avatar_url":"https://github.com/eddmpython.png","language":"Python","funding_links":["https://github.com/sponsors/eddmpython","https://buymeacoffee.com/eddmpython"],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cbr\u003e\n\n\u003cimg alt=\"DartLab\" src=\".github/assets/logo.png\" width=\"180\"\u003e\n\n\u003ch3\u003eDartLab\u003c/h3\u003e\n\n\u003cp\u003e\u003cb\u003eBeyond the numbers\u003c/b\u003e — Extract both financials and text from DART filings\u003c/p\u003e\n\n\u003cp\u003e\n\u003ca href=\"https://pypi.org/project/dartlab/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/dartlab?style=for-the-badge\u0026color=ea4647\u0026labelColor=050811\u0026logo=pypi\u0026logoColor=white\" alt=\"PyPI\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pypi.org/project/dartlab/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/pyversions/dartlab?style=for-the-badge\u0026color=c83232\u0026labelColor=050811\u0026logo=python\u0026logoColor=white\" alt=\"Python\"\u003e\u003c/a\u003e\n\u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-94a3b8?style=for-the-badge\u0026labelColor=050811\" alt=\"License\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp\u003e\n\u003ca href=\"https://eddmpython.github.io/dartlab/\"\u003eDocs\u003c/a\u003e · \u003ca href=\"README_KR.md\"\u003e한국어\u003c/a\u003e · \u003ca href=\"https://buymeacoffee.com/eddmpython\"\u003eSponsor\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp\u003e\n\u003ca href=\"https://github.com/eddmpython/dartlab/releases/tag/data-docs\"\u003e\u003cimg src=\"https://img.shields.io/badge/Docs-260%2B_Companies-f87171?style=for-the-badge\u0026labelColor=050811\u0026logo=databricks\u0026logoColor=white\" alt=\"Docs Data\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/eddmpython/dartlab/releases/tag/data-finance-1\"\u003e\u003cimg src=\"https://img.shields.io/badge/Finance-2,700%2B_Companies-818cf8?style=for-the-badge\u0026labelColor=050811\u0026logo=databricks\u0026logoColor=white\" alt=\"Finance Data\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/eddmpython/dartlab/releases/tag/data-report-1\"\u003e\u003cimg src=\"https://img.shields.io/badge/Report-2,700%2B_Companies-34d399?style=for-the-badge\u0026labelColor=050811\u0026logo=databricks\u0026logoColor=white\" alt=\"Report Data\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003c/div\u003e\n\n## What is DartLab?\n\nDartLab is a Python library for parsing and analyzing [DART (Data Analysis, Retrieval and Transfer System)](https://dart.fss.or.kr/) — Korea's official electronic disclosure system. It extracts **both financial numbers and narrative text** from corporate filings.\n\nAll data is accessed through simple properties on a `Company` object, following the yfinance-style API.\n\n## Installation\n\n```bash\npip install dartlab\n```\n\n```bash\nuv add dartlab\n```\n\n## Quick Start\n\n```python\nfrom dartlab import Company\n\nc = Company(\"005930\")       # by stock code\nc = Company(\"삼성전자\")      # by company name (Korean)\nc.corpName                   # \"삼성전자\"\n```\n\nCreating a `Company` object prints a usage guide. For the full guide, call `c.guide()`.\n\nData is auto-downloaded from GitHub Releases when not found locally.\n\n```python\nfrom dartlab.core.dataLoader import downloadAll\n\ndownloadAll(\"docs\")                        # 260+ companies — disclosure documents\ndownloadAll(\"finance\")                     # 2,700+ companies — financial numbers\ndownloadAll(\"report\")                      # 2,700+ companies — periodic reports\ndownloadAll(\"finance\", forceUpdate=True)   # re-download if remote is newer\n```\n\n---\n\n## Features\n\n### Financial Statements\n\n```python\nc.BS    # Balance Sheet (DataFrame)\nc.IS    # Income Statement (DataFrame)\nc.CF    # Cash Flow Statement (DataFrame)\n```\n\n### Cross-Company Comparable Time Series (financeEngine)\n\nOpenDART financial data is mapped to standardized accounts, enabling **cross-company quarterly time series**.\n\n```python\nseries, periods = c.timeseries\n# periods = [\"2016_Q1\", \"2016_Q2\", ..., \"2024_Q4\"]\n# series[\"IS\"][\"revenue\"]            # quarterly revenue\n# series[\"BS\"][\"total_assets\"]       # quarterly total assets\n# series[\"CF\"][\"operating_cashflow\"] # quarterly operating cash flow\n\nr = c.ratios\nr.roe               # 8.29 (%)\nr.operatingMargin   # 9.51 (%)\nr.debtRatio         # 27.4 (%)\nr.fcf               # Free Cash Flow (KRW)\n```\n\n2,700+ listed companies are normalized to the same snakeId schema, making any pair of companies directly comparable.\n\n### Summary Financials with Bridge Matching\n\nExtracts summary financial time series, automatically tracking accounts even when names change due to K-IFRS revisions.\n\n```python\nresult = c.fsSummary()\n\nresult.FS          # Full financial time series (Polars DataFrame)\nresult.BS          # Balance Sheet\nresult.IS          # Income Statement\nresult.allRate     # Overall match rate (e.g. 0.97)\nresult.breakpoints # List of detected breakpoints\n```\n\n### K-IFRS Notes (12 items)\n\n```python\nc.notes.inventory          # Inventories\nc.notes[\"재고자산\"]         # Korean key also works\nc.notes.receivables        # Trade receivables\nc.notes.tangibleAsset      # Property, plant \u0026 equipment\nc.notes.intangibleAsset    # Intangible assets\nc.notes.investmentProperty # Investment property\nc.notes.affiliates         # Associates\nc.notes.borrowings         # Borrowings\nc.notes.provisions         # Provisions\nc.notes.eps                # Earnings per share\nc.notes.lease              # Leases\nc.notes.segments           # Operating segments\nc.notes.costByNature       # Expenses by nature\n```\n\n### Dividends\n\n```python\nc.dividend\n# ┌──────┬───────────┬───────┬──────────────┬─────────────┬──────────────┬──────┐\n# │ year ┆ netIncome ┆ eps   ┆ totalDividend┆ payoutRatio ┆ dividendYield┆ dps  │\n# └──────┴───────────┴───────┴──────────────┴─────────────┴──────────────┴──────┘\n```\n\n### Major Shareholders\n\n```python\nc.majorHolder    # Largest shareholder + related parties ownership (time series)\n```\n\nFor the full Result object: `c.get(\"majorHolder\")`\n\n```python\nresult = c.get(\"majorHolder\")\nresult.majorHolder   # \"이재용\"\nresult.majorRatio    # 20.76\nresult.timeSeries    # Ownership ratio time series\n```\n\n### Employees\n\n```python\nc.employee    # year, totalEmployees, avgSalary, avgTenure, ...\n```\n\n### Audit Opinion\n\n```python\nc.audit    # year, auditor, opinion, keyAuditMatters\n```\n\n### Executives\n\n```python\nc.executive      # year, totalRegistered, insideDirectors, outsideDirectors, ...\nc.executivePay   # year, category, headcount, totalPay, avgPay\n```\n\n### Shares / Capital\n\n```python\nc.shareCapital     # Issued, treasury, outstanding shares\nc.capitalChange    # Capital changes\nc.fundraising      # Capital increases/decreases\n```\n\n### Subsidiaries / Associates\n\n```python\nc.subsidiary           # Investments in other corporations\nc.affiliateGroup       # Affiliate group companies\nc.investmentInOther    # Investee, ownership ratio, book value\n```\n\n### Board / Governance\n\n```python\nc.boardOfDirectors     # Board composition, attendance\nc.shareholderMeeting   # Shareholder meeting agendas, resolutions\nc.auditSystem          # Audit committee, audit activities\nc.internalControl      # Internal control assessment\n```\n\n### Risk / Legal\n\n```python\nc.contingentLiability  # Contingent liabilities, lawsuits\nc.relatedPartyTx       # Related party transactions\nc.sanction             # Sanctions, penalties\nc.riskDerivative       # FX sensitivity, derivatives\n```\n\n### Other Financials\n\n```python\nc.bond                 # Debt securities\nc.rnd                  # R\u0026D expenses\nc.otherFinance         # Allowance for bad debt, etc.\nc.productService       # Major products/services\nc.salesOrder           # Sales performance, order backlog\nc.articlesOfIncorporation  # Articles of incorporation amendments\n```\n\n### Company Info\n\n```python\nc.companyHistory         # Corporate history\nc.companyOverviewDetail  # Incorporation date, listing date, CEO, address\n```\n\n### Disclosure Narratives\n\n```python\nc.business       # Business overview (sections + change detection)\nc.overview       # Company overview (incorporation, address, credit rating)\nc.mdna           # Management Discussion \u0026 Analysis\nc.rawMaterial    # Raw materials, tangible assets, capex\n```\n\n### Raw Data Access\n\n```python\nc.rawDocs        # Original docs parquet (unprocessed)\nc.rawFinance     # Original finance parquet (unprocessed)\nc.rawReport      # Original periodic report parquet (unprocessed)\n```\n\n---\n\n## AI Analysis (dartlab ai)\n\nChat with an LLM over DartLab's structured data to analyze companies interactively.\n\n```bash\npip install dartlab[ui]\ndartlab ai\n# → http://localhost:8400\n```\n\nProvides all extracted data — financial statements, notes, dividends, executives, governance — as context for natural-language Q\u0026A with streaming responses.\n\n\u003e **Currently supported LLM: Ollama (local)**\n\u003e\n\u003e The current version supports **Ollama** for local LLM inference. No API key needed, and your data stays on your machine.\n\u003e\n\u003e - Install [Ollama](https://ollama.com/download), then `ollama pull gemma3` to download a model\n\u003e - Select and download models in the UI settings\n\u003e - GPU (NVIDIA/AMD) is auto-detected for acceleration\n\u003e\n\u003e **Coming soon**: Cloud LLM providers (OpenAI, Anthropic, etc.)\n\n---\n\n## Bulk Extraction\n\n```python\nd = c.all()    # All module data as dict (with progress bar)\n# {\"BS\": df, \"IS\": df, \"CF\": df, \"dividend\": df, \"notes\": {...},\n#  \"timeseries\": (series, periods), \"ratios\": RatioResult, ...}\n```\n\n```python\nimport dartlab\ndartlab.verbose = False    # Suppress progress output\n\nd = c.all()    # Silent extraction\n```\n\n---\n\n## Result Object\n\nProperties return the primary DataFrame. For the full Result object, use `c.get()`.\n\n```python\n# property — returns DataFrame directly\nc.audit          # opinionDf (audit opinion DataFrame)\n\n# get() — returns full Result object\nresult = c.get(\"audit\")\nresult.opinionDf   # Audit opinion\nresult.feeDf       # Audit fees\n```\n\n---\n\n## Company Search\n\n```python\nfrom dartlab import Company\n\nCompany.search(\"삼성\")\n# ┌──────────────┬──────────┬────────────────┐\n# │ 회사명       ┆ 종목코드 ┆ 업종           │\n# └──────────────┴──────────┴────────────────┘\n\nCompany.listing()   # Full KRX listed companies\nCompany.status()    # Local data index\nc.docs()            # Filing list + DART viewer links\n```\n\n---\n\n## Core Technology\n\n### Horizontal Alignment of Filings\n\nDART filings cover different periods depending on report type:\n\n```\n                           Q1         Q2         Q3         Q4\n                          ┌──────┐\n Q1 Report                │  Q1  │\n                          └──────┘\n                          ┌──────────────┐\n Semi-Annual              │   Q1 + Q2    │\n                          └──────────────┘\n                          ┌─────────────────────┐\n Q3 Report                │    Q1 + Q2 + Q3     │\n                          └─────────────────────┘\n                          ┌──────────────────────────────┐\n Annual Report            │       Q1 + Q2 + Q3 + Q4      │\n                          └──────────────────────────────┘\n```\n\nQ1 reports contain only Q1, semi-annual reports contain cumulative Q1+Q2, and annual reports contain the full year. DartLab reverse-engineers standalone quarterly figures from these cumulative structures, and tracks accounts even when names change between filings.\n\n### Bridge Matching\n\nK-IFRS revisions and internal restructuring frequently cause **account name changes within the same company**. Bridge Matching combines amount matching and name similarity across adjacent years to automatically link identical accounts.\n\n```\n             2022              2023              2024\n             ──────            ──────            ──────\n 매출액 ────────────── 매출액 ────────────── 수익(매출액)\n                              ↑ name change              ↑ name change\n 영업이익 ──────────── 영업이익 ──────────── 영업이익\n 당기순이익 ────────── 당기순이익 ────────── 당기순이익(손실)\n```\n\nFour-stage matching process:\n\n1. **Exact match** — identical amounts\n2. **Restatement match** — within 0.5 tolerance\n3. **Name change match** — amount error \u003c 5% AND name similarity \u003e 60%\n4. **Special item match** — decimal-unit items like EPS\n\nWhen match rate drops below 85%, a breakpoint is detected and the segment is split.\n\n---\n\n## Data\n\n### Sources and Integrity\n\nAll data originates from **[OpenDART](https://opendart.fss.or.kr/)** and **[DART](https://dart.fss.or.kr/)**, Korea's official electronic disclosure system. The developer has **not modified a single number** — only metadata columns (stock code, year, report type, etc.) have been added for structural organization.\n\nIf you want to verify, you can cross-check any value against the original filings using the package's built-in DART viewer links (`c.docs()`).\n\nEach Parquet file contains all filings for a single company:\n\n- **Metadata**: stock code, company name, report type, filing date, business year\n- **Quantitative**: summary financials, financial statement body, notes\n- **Narrative**: business description, audit opinion, risk management, executive/shareholder status\n\n### Data Releases\n\n| Category | Release Tags | Description | Count |\n|----------|-------------|-------------|-------|\n| Disclosure | [`data-docs`](https://github.com/eddmpython/dartlab/releases/tag/data-docs) | Parsed annual report sections | 260+ |\n| Finance | [`data-finance-1`](https://github.com/eddmpython/dartlab/releases/tag/data-finance-1) [`2`](https://github.com/eddmpython/dartlab/releases/tag/data-finance-2) [`3`](https://github.com/eddmpython/dartlab/releases/tag/data-finance-3) [`4`](https://github.com/eddmpython/dartlab/releases/tag/data-finance-4) | XBRL financial statement numbers | 2,700+ |\n| Report | [`data-report-1`](https://github.com/eddmpython/dartlab/releases/tag/data-report-1) [`2`](https://github.com/eddmpython/dartlab/releases/tag/data-report-2) [`3`](https://github.com/eddmpython/dartlab/releases/tag/data-report-3) [`4`](https://github.com/eddmpython/dartlab/releases/tag/data-report-4) | Periodic report data | 2,700+ |\n\nFinance and Report data are split into 4 tags by stock code range (GitHub's 1000-asset-per-release limit). `loadData()` and `downloadAll()` handle this automatically.\n\n### Bring Your Own Data\n\nIf you structure your own Parquet files to match DartLab's schema, all existing features work out of the box. Place files as `data/{category}/{stockCode}.parquet` and every property, extraction module, and analysis tool will function normally.\n\n### Disclaimer\n\nThis project is licensed under MIT. While the data faithfully mirrors OpenDART public disclosures, **no guarantee of commercial reliability is provided**. Always verify against official sources for investment or compliance decisions.\n\n\u003e **Update frequency**\n\u003e\n\u003e Data is collected directly without paid proxies, so updates may be slow. Adding new companies or reflecting the latest filings may take time.\n\n---\n\n## Why DartLab?\n\nDART filings contain far more than financial numbers — business descriptions, risk factors, audit opinions, litigation status, and governance changes are all embedded in the text. Most tools only extract the numbers. The rest is discarded.\n\nDartLab extracts both. It aligns quarterly, semi-annual, and annual reports on a single time axis, and automatically tracks accounts even when K-IFRS revisions or restructuring changes their names.\n\n\u003e **Current scope**\n\u003e\n\u003e Bridge Matching tracks account name changes **within a single company** across years. financeEngine enables **cross-company comparison** by mapping XBRL accounts to standardized snakeIds. 2,700+ listed companies are normalized to the same structure.\n\u003e\n\u003e Text analysis capabilities are being developed in a **separate project** and will be integrated into DartLab.\n\u003e\n\u003e The ultimate goal is a tool that can analyze the **entire market** at once, not just one company.\n\n## Roadmap\n\n- [x] Summary financial time series (Bridge Matching)\n- [x] Consolidated BS, IS, CF\n- [x] Segment revenue, associates, dividends, employees, shareholders, subsidiaries\n- [x] Debt securities, expenses by nature, raw materials/capex\n- [x] Audit opinion, executive status, executive compensation\n- [x] PPE movement, note details (23 keywords)\n- [x] Board of directors, capital changes, contingent liabilities, related party tx, sanctions, R\u0026D, internal control\n- [x] Affiliate groups, capital raises, sales/orders, products, risk management/derivatives\n- [x] MD\u0026A, business description, company overview\n- [x] Company property API + Notes integration + all()\n- [x] Rich terminal output (avatar + usage guide)\n- [x] Account standardization engine (financeEngine) — 2,700+ companies cross-comparable\n- [x] Quarterly time series + financial ratios (c.timeseries, c.ratios)\n- [x] AI analysis web interface (dartlab ai) — Ollama local LLM\n- [ ] Cloud LLM providers (OpenAI, Anthropic, etc.)\n- [ ] Text analysis module integration (from separate project)\n- [ ] Quantitative + qualitative cross-validation\n- [ ] Visualization\n\n## Sponsor\n\n\u003ca href=\"https://buymeacoffee.com/eddmpython\"\u003e\n  \u003cimg src=\"https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png\" alt=\"Buy Me A Coffee\" width=\"180\"/\u003e\n\u003c/a\u003e\n\n## License\n\nMIT License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feddmpython%2Fdartlab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feddmpython%2Fdartlab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feddmpython%2Fdartlab/lists"}