https://github.com/ozefe/yoktez
Typed Python client for searching, fetching metadata, and downloading theses from the National Thesis Center of Turkey (YÖK Ulusal Tez Merkezi)
https://github.com/ozefe/yoktez
academic-project api-client api-wrapper httpx-client thesis ulusal-tez-merkezi web-scraping
Last synced: 16 days ago
JSON representation
Typed Python client for searching, fetching metadata, and downloading theses from the National Thesis Center of Turkey (YÖK Ulusal Tez Merkezi)
- Host: GitHub
- URL: https://github.com/ozefe/yoktez
- Owner: ozefe
- License: mit
- Created: 2026-03-25T16:23:07.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-14T03:09:41.000Z (23 days ago)
- Last Synced: 2026-05-14T03:25:38.095Z (23 days ago)
- Topics: academic-project, api-client, api-wrapper, httpx-client, thesis, ulusal-tez-merkezi, web-scraping
- Language: Python
- Homepage: https://github.com/ozefe/yoktez
- Size: 597 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Security: .github/SECURITY.md
Awesome Lists containing this project
README
# yoktez

Typed Python client for the [National Thesis Center of Turkey](https://tez.yok.gov.tr/UlusalTezMerkezi/).
`yoktez` wraps the YOK NTC JSP/AJAX surface behind a single synchronous `Client` with frozen-dataclass return types, a deterministic exception hierarchy, and bilingual-aware fields. Built for application and CLI developers who need a typed surface and a small install footprint without writing bespoke scraping code for each project.
## Installation
```bash
pip install yoktez
```
Requires Python 3.14+.
## Quickstart
```python
"""End-to-end yoktez quickstart: search -> metadata -> assets.
Demonstrates the typical three-call flow without writing files to disk.
Run with: `python examples/quickstart.py`
"""
from yoktez import AssetStatus, Client
_QUERY = "yapay zeka"
with Client() as client:
results = client.search.simple(_QUERY)
print(f"{results.total} matches for {_QUERY!r}")
thesis = results[0]
print(f" title: {thesis.title}")
print(f" author: {thesis.author}")
print(f" year: {thesis.year}")
print(f" keys: {thesis.registration_no} / {thesis.thesis_no}")
metadata = client.metadata.get(thesis)
print(f" advisor: {metadata.supervisor}")
if metadata.affiliation is not None:
print(f" uni: {metadata.affiliation.university}")
if metadata.keywords is not None:
print(f" tags: {len(metadata.keywords)} keywords")
assets = client.assets.get(thesis)
print(f" status: {assets.status.name}")
if assets.status is AssetStatus.AVAILABLE:
print(f" pdf_key: {assets.pdf_key}")
```
Sample output:
```text
6841 matches for 'yapay zeka'
title: Kimya eğitiminde yapay zekâ araştırmalarına ilişkin bir meta-sentez çalışması
author: MURAT EBUBEKİR YAYLA
year: 2026
keys: nslbSyAODG1_FIruL8qUAA / THvIvDpZXvJIiHZpuqpKVw
advisor: PROF. DR. MUSA ÜCE
uni: MARMARA ÜNİVERSİTESİ
tags: 5 keywords
status: AVAILABLE
pdf_key: 5T1_CZ5-UGb9QCmoURec4AbpuuyvqUeed_1PcCh_6DVZ4b1fbX7Gcu-DQFLIcE11
```
## Features
- **Four search modes:** `simple`, `advanced`, `detail`, and `recent` from a single `client.search` namespace, all returning a sliceable `SearchResults` carrying the database-wide match total alongside the result window.
- **Structured metadata:** `client.metadata.get(thesis)` returns a typed `ThesisMetadata` with bilingual keywords (`Bilingual(raw, tr, en)`), a tiered `Affiliation`, and pre-formatted citation strings (APA / IEEE / MLA / Chicago / Harvard).
- **Two-step asset download:** `client.assets.get(thesis)` resolves to one of `AVAILABLE` / `UNDER_EMBARGO` / `NO_PERMIT` / `PREPARING` before any bytes move; the available branch exposes a `pdf_key` (and optional `appendix_key`) to feed `download_pdf` / `download_appendix`.
- **Catalog lookups:** `client.lookups` covers universities (TR / INT), institutes, divisions, subjects, departments, sections, and keywords, with per-instance memoization and an explicit `refresh()`.
- **Typed value objects:** every returned record is a `@dataclass(frozen=True, slots=True)`; values are immutable, hashable where field types allow, and ship with `py.typed` for downstream type checkers.
- **Sync-only, thread-friendly:** no `async`/`await` surface; the recommended concurrency pattern is one `Client` per thread.
- **Small dependency surface:** `httpx`, `beautifulsoup4`, and `lxml`. No Rust core, no auth, no hidden state.
## Usage
All snippets assume `with Client() as client:` for deterministic cleanup of the underlying HTTP connection pool.
### Search
Simple search by free text, optionally narrowed to a single field:
```python
from yoktez import Client, SearchField
with Client() as client:
results = client.search.simple("yapay zeka", field=SearchField.ABSTRACT)
print(f"{results.total} matches")
for thesis in results[:5]:
print(thesis.year, thesis.title)
```
Advanced search joins up to three terms with boolean operators:
```python
from yoktez import AdvancedOperator, Client, MatchType
with Client() as client:
results = client.search.advanced(
"sosyal",
term2="medya",
op1=AdvancedOperator.AND,
match=MatchType.INCLUDES,
)
```
Detail search accepts the full filter surface; enum-shaped parameters also accept the member name as a string or the raw int code:
```python
from yoktez import Client, ThesisType
with Client() as client:
unis = client.lookups.universities()
results = client.search.detail(
university=unis[0],
year_min=2020,
year_max=2025,
degree_type=ThesisType.MASTER, # also accepts "MASTER" or 1
)
```
Recently added theses (server-fixed 15-day window):
```python
from yoktez import Client
with Client() as client:
results = client.search.recent()
```
### Metadata
```python
from yoktez import Client
with Client() as client:
thesis = client.search.simple("makine öğrenmesi")[0]
metadata = client.metadata.get(thesis)
if metadata.affiliation is not None:
print(metadata.affiliation.university)
if metadata.keywords:
print(metadata.keywords[0].tr, "=", metadata.keywords[0].en)
if metadata.references is not None:
print(metadata.references.apa)
```
### Assets (two-step download)
```python
from yoktez import AssetStatus, Client
with Client() as client:
thesis = client.search.simple("yapay zeka")[0]
assets = client.assets.get(thesis)
if assets.status is AssetStatus.AVAILABLE and assets.pdf_key is not None:
client.assets.download_pdf(assets.pdf_key, "thesis.pdf")
if assets.appendix_key is not None:
client.assets.download_appendix(assets.appendix_key, "thesis-ek.rar")
```
`download_pdf` and `download_appendix` accept a filesystem path (`Path` or `str`, opened and closed for you) or a pre-opened binary file-like (written to but not closed — ownership stays with the caller).
### Lookups
```python
from yoktez import Client, UniversitySource
with Client() as client:
unis = client.lookups.universities(UniversitySource.TR)
institutes = client.lookups.institutes(unis[0])
divisions = client.lookups.divisions(unis[0], institutes[0])
# Bulk catalogs; keywords() also accepts group / language / first_letter / search.
keywords = client.lookups.all_keywords()
```
Every `client.lookups.*` call is memoized on the `Client` instance. Call `client.lookups.refresh()` to clear the cache if YOKSIS IDs are suspected to have rotated.
### HTTP client configuration
`Client` accepts keyword-only overrides for the underlying `httpx.Client`:
```python
from yoktez import Client
with Client(timeout=60, retries=5, user_agent="my-app/1.0") as client:
...
```
For full control, inject a pre-built `httpx.Client` via `http_client=`. Ownership stays with the caller; `Client.close()` is a no-op for an injected client:
```python
import httpx
from yoktez import Client
http = httpx.Client(timeout=30.0, follow_redirects=True)
try:
with Client(http_client=http) as client:
...
finally:
http.close()
```
## Concurrency
`yoktez.Client` is single-threaded by design — share one per thread, never across threads. The library ships no concurrency primitives; threading strategy is the caller's choice.
## Design principles
- **Synchronous-only API:** Sync is sufficient for YOK NTC's IO patterns; an async surface would double the API and complicate testing for no proven benefit. Concurrency strategy belongs to the caller, and `examples/multithreaded_pool.py` demonstrates the one-`Client`-per-thread pattern.
- **Frozen-dataclass value objects:** Every returned record is `@dataclass(frozen=True, slots=True)`. Stdlib-only, immutable, hashable, and very fast.
- **Coerce-on-input enum handling:** Enum-shaped parameters accept the matching `Enum` member, its name (e.g., `"MASTER"`), or its raw int code; the raw-`int` passthrough tolerates new YOK NTC codes the library hasn't yet enumerated, so wire-side additions don't gate a release.
- **Two-step download flow:** `client.assets.get(...)` resolves status first; `download_pdf` and `download_appendix` run only on the available branch. Honest to the underlying YOK NTC flow, and lets callers inspect embargo dates and appendix availability before committing to a second request.
- **Hierarchical logger naming:** Every sub-package logs under `yoktez.` (`yoktez.http`, `yoktez.search`, `yoktez.lookups`, `yoktez.assets`). Operators can silence the high-volume HTTP DEBUG channel while preserving the rarer parser WARNING channels; a single `logging.getLogger("yoktez").setLevel(...)` still catches every child through parent propagation.
## Limitations
`yoktez` is intentionally narrow. The following are out of scope and will not land in this package:
- **No async API:** Synchronous code throughout; no `async def`, no asyncio surface.
- **No multi-threaded helper functions:** Concurrency strategy is the caller's choice.
- **No authentication or login flows (e-Devlet):** Anonymous public-data access only; features requiring login (favorites, history) are excluded.
- **No bypassing access restrictions:** Embargoed and no-permit theses surface their state via `AssetStatus` and the matching exception types; the library does not attempt to circumvent these.
- **No data hosting or mirroring:** The library fetches on demand; no bundled snapshots of the YOK NTC database.
- **No CLI shipped from this package:** A separate package may add one later — out of scope here.
## License
MIT — see [`LICENSE`](LICENSE).