Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/astrolabsoftware/spark-fits
FITS data source for Spark SQL and DataFrames
https://github.com/astrolabsoftware/spark-fits
apache-spark fits fitsio hdfs pyspark scala spark-sql
Last synced: 3 months ago
JSON representation
FITS data source for Spark SQL and DataFrames
- Host: GitHub
- URL: https://github.com/astrolabsoftware/spark-fits
- Owner: astrolabsoftware
- License: apache-2.0
- Created: 2018-01-31T20:24:35.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2023-04-12T06:06:50.000Z (almost 2 years ago)
- Last Synced: 2024-09-29T04:42:04.872Z (4 months ago)
- Topics: apache-spark, fits, fitsio, hdfs, pyspark, scala, spark-sql
- Language: Scala
- Homepage: https://astrolabsoftware.github.io/spark-fits/
- Size: 8.97 MB
- Stars: 20
- Watchers: 7
- Forks: 7
- Open Issues: 21
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.rst
- License: LICENSE
Awesome Lists containing this project
README
# FITS Data Source for Apache Spark
[![Build Status](https://travis-ci.org/astrolabsoftware/spark-fits.svg?branch=master)](https://travis-ci.org/astrolabsoftware/spark-fits)
[![codecov](https://codecov.io/gh/astrolabsoftware/spark-fits/branch/master/graph/badge.svg?style=platic)](https://codecov.io/gh/astrolabsoftware/spark-fits)
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.astrolabsoftware/spark-fits_2.11/badge.svg?style=flat)](https://maven-badges.herokuapp.com/maven-central/com.github.astrolabsoftware/spark-fits_2.11)
[![Arxiv](http://img.shields.io/badge/arXiv-1804.07501-yellow.svg?style=platic)](https://arxiv.org/abs/1804.07501)## Latest news
- [01/2018] **Launch**: project starts!
- [03/2018] **Release**: version 0.3.0
- [04/2018] **Paper**: [![Arxiv](http://img.shields.io/badge/arXiv-1804.07501-yellow.svg?style=platic)](https://arxiv.org/abs/1804.07501)
- [05/2018] **Release**: version 0.4.0
- [06/2018] **New location**: spark-fits is an official project of [AstroLab](https://astrolabsoftware.github.io/)!
- [07/2018] **Release**: version 0.5.0, 0.6.0
- [10/2018] **Release**: version 0.7.0, 0.7.1
- [12/2018] **Release**: version 0.7.2
- [03/2019] **Release**: version 0.7.3
- [05/2019] **Release**: version 0.8.0, 0.8.1, 0.8.2
- [06/2019] **Release**: version 0.8.3
- [05/2020] **Release**: version 0.8.4
- [07/2020] **Release**: version 0.9.0
- [04/2021] **Release**: version 1.0.0## spark-fits
This library provides two different tools to manipulate
[FITS](https://fits.gsfc.nasa.gov/fits_home.html) data with [Apache
Spark](http://spark.apache.org/):- A Spark connector for FITS file.
- A Scala library to manipulate FITS file.The user interface has been done to be the same as other built-in Spark
data sources (CSV, JSON, Avro, Parquet, etc). Note that spark-fits follows Apache Spark Data Source V1 ([plan](https://github.com/astrolabsoftware/spark-fits/issues/50) to migrate to V2). See our [website](https://astrolabsoftware.github.io/spark-fits/) for more information. To include spark-fits in your job:```bash
# Scala 2.11
spark-submit --packages "com.github.astrolabsoftware:spark-fits_2.11:1.0.0" <...># Scala 2.12
spark-submit --packages "com.github.astrolabsoftware:spark-fits_2.12:1.0.0" <...>
```or you can link against this library in your program at the following coordinates in your build.sbt
```scala
// Scala 2.11
libraryDependencies += "com.github.astrolabsoftware" % "spark-fits_2.11" % "1.0.0"// Scala 2.12
libraryDependencies += "com.github.astrolabsoftware" % "spark-fits_2.12" % "1.0.0"
```Currently available:
- Read fits file and organize the HDU data into DataFrames.
- Automatically distribute bintable rows over machines.
- Automatically distribute image rows over machines.
- Automatically infer DataFrame schema from the HDU header.## Header Challenge!
The header tested so far are very simple, and not so exotic. Over the
time, we plan to add many new features based on complex examples (see
[here](https://github.com/astrolabsoftware/spark-fits/tree/master/src/test/resources/toTest)).
If you use spark-fits, and encounter errors while reading a header, tell
us (issues or PR) so that we fix the problem asap!## TODO list
- Define custom Hadoop InputFile.
- Migrate to Spark DataSource V2## Support