Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/husnusensoy/tt-bootcamp
Turk Telekom Data Bootcamp Repository
https://github.com/husnusensoy/tt-bootcamp
data-engineering data-governance data-quality data-science mlops python spark sql
Last synced: about 13 hours ago
JSON representation
Turk Telekom Data Bootcamp Repository
- Host: GitHub
- URL: https://github.com/husnusensoy/tt-bootcamp
- Owner: husnusensoy
- Created: 2025-01-27T21:14:32.000Z (12 days ago)
- Default Branch: main
- Last Pushed: 2025-02-05T20:59:54.000Z (3 days ago)
- Last Synced: 2025-02-05T21:41:27.748Z (3 days ago)
- Topics: data-engineering, data-governance, data-quality, data-science, mlops, python, spark, sql
- Language: Jupyter Notebook
- Homepage:
- Size: 31.3 KB
- Stars: 4
- Watchers: 2
- Forks: 18
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# File Formats
## JSON
Most popular human readable format with schema flexibility.
### Advantages
* Her bir satırın metadatası kendi içinde
* Schema flexibility
* Sparse representation.
* Nested/Complex typelara izin var.
* Human readable.
* XML e göre daha az yer tutuyor.
* Satır bazlı sorgularda hızlı### Disadvantages
* Anahtar/Colon isimleri tekrar ediyor. Depolama verimsiz.
* Attribute/anahtar bazlı gittiğimizde performans kötü.
* Text based olduğu için parsing verimsizlik ?!?
* No type forcing.## Fix Length
Most popular format in old generation main frame systems.
### Advantages
* Fast offset based parsing
* Prevent memory/disk fragmentation (SLAB allocators)
* Human readable.
* Satır bazlı sorgularda hızlı### Disadvantages
* Redundant space usage
* No type forcing.
* Text based olduğu için parsing verimsizlik ?!?
* No metadata information.
* Nested/Complex typelara izin yok.
* Colon bazlı sorgularda yavaş
* Schema evaluation (type resizing) çok zor.
* Yeni alan sadece sona eklenebiliyor.## Delimited File System
### Advantages
* Lots of tool support.
* Relatively compacted compared to Fix Length.
* Human readable.
* Satır bazlı sorgularda hızlı### Disadvantages
* Less readable compared JSON and fix length.
* No type forcing.
* Text based olduğu için parsing verimsizlik ?!?
* No metadata information.
* Partial support on Nested/Complex typelara.
* Colon bazlı sorgularda yavaş
* Schema evaluation (type resizing) zor.
* Yeni alan sadece sona eklenebiliyor.## Custom Columnar Format
### Advantages
* Relatively compacted compared to Fix Length.
* Human readable.
* Satır bazlı sorgularda yavaş
* Better compression
* Filter performans ?### Disadvantages
* Deletion ?
* Less readable compared JSON and fix length.
* No type forcing.
* Text based olduğu için parsing verimsizlik ?!?
* No metadata information.
* Partial support on Nested/Complex typelara.
* Muuuuch faster columnar queries
* Schema evaluation (type resizing) zor.