https://github.com/ehsanmok/spark-lp

Distributed Linear Programming Solver on top of Apache Spark
https://github.com/ehsanmok/spark-lp

distributed-computing distributed-optimization high-performance linear-programming scala spark

Last synced: 6 months ago
JSON representation

Distributed Linear Programming Solver on top of Apache Spark

Host: GitHub
URL: https://github.com/ehsanmok/spark-lp
Owner: ehsanmok
License: apache-2.0
Created: 2016-11-14T22:15:14.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2021-01-04T19:07:27.000Z (over 4 years ago)
Last Synced: 2025-01-06T17:14:31.719Z (6 months ago)
Topics: distributed-computing, distributed-optimization, high-performance, linear-programming, scala, spark
Language: Scala
Homepage: https://open.library.ubc.ca/cIRcle/collections/ubctheses/24/items/1.0340337
Size: 38.1 KB
Stars: 75
Watchers: 7
Forks: 23
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # spark-lp

![Project unmaintained](https://img.shields.io/badge/project-unmaintained-red.svg)

This package offers an implementation of [Mehrohra's predictor-corrector interior point algorithm](https://en.wikipedia.org/wiki/Mehrotra_predictor%E2%80%93corrector_method), described in my thesis [Distributed linear programming with Apache Spark](https://open.library.ubc.ca/cIRcle/collections/ubctheses/24/items/1.0340337), to solve **large-scale** [linear programming](https://en.wikipedia.org/wiki/Linear_programming) problems at the *lowest cost* using Apache Spark.

Linear programming has the following standard form: 

	minimize c^T x 

	subject to Ax=b and x >= 0

where `c, b` are given vectors ((.)^T is the traspose operation), `A` is a given `m` by `n` matrix and `x` is the objective vector. We assume that in `A` the number of rows (equations) is

at most equal to the number of columns (unknowns) (`m <= n`) and `A` has full row rank, thus `AA^T` is invertible.

## Example

The following is an example of using spark-lp *locally* to solve a linear programming problem in parallel with 2 cores and 2 partitions:

	import org.apache.spark.{SparkConf, SparkContext}

	import org.apache.spark.mllib.linalg.{DenseVector, Vector, Vectors}

	import org.apache.spark.mllib.optimization.lp.VectorSpace._

	import org.apache.spark.mllib.optimization.lp.vs.dvector.DVectorSpace

	import org.apache.spark.mllib.optimization.lp.vs.vector.DenseVectorSpace

	import org.apache.spark.mllib.optimization.lp.LP

	val sparkConf = new SparkConf().setMaster("local[2]").setAppName("TestLPSolver")

	val sc = new SparkContext(sparkConf)

	val numPartitions = 2

	val cArray = Array(2.0, 1.5, 0.0, 0.0, 0.0, 0.0, 0.0)

	val BArray = Array(

    	Array(12.0, 16.0, 30.0, 1.0, 0.0),

    	Array(24.0, 16.0, 12.0, 0.0, 1.0),

    	Array(-1.0, 0.0, 0.0, 0.0, 0.0),

    	Array(0.0, -1.0, 0.0, 0.0, 0.0),

    	Array(0.0, 0.0, -1.0, 0.0, 0.0),

    	Array(0.0, 0.0, 0.0, 1.0, 0.0),

    	Array(0.0, 0.0, 0.0, 0.0, 1.0))

	val bArray = Array(120.0, 120.0, 120.0, 15.0, 15.0)

	val c: DVector = sc.parallelize(cArray, numPartitions).glom.map(new DenseVector(_))

	val rows: DMatrix = sc.parallelize(BArray, numPartitions).map(Vectors.dense(_))

	val b: DenseVector = new DenseVector(bArray)

	val (v, x): (Double, DVector) = LP.solve(c, rows, b, sc=sc)

	val xx = Vectors.dense(x.flatMap(_.toArray).collect())

	println(s"optimial vector is $xx")

	println("optimal min value: " + v)

## Software Architecture Overview

Detailed descriptions of our design is described in chapter 4 of the [thesis](https://open.library.ubc.ca/cIRcle/collections/ubctheses/24/items/1.0340337).

## Advantages

* spark-lp is unique because it is **open-source** and it can solve large-scale LP problems in a distributed way with **fault-tolerance** over **commodity clusters** of machines. Thus, it provides the *lowest cost* opportunity for such applications. See page 42 for cluster results [here](https://open.library.ubc.ca/cIRcle/collections/ubctheses/24/items/1.0340337).

* spark-lp is at least ~10X *faster* and more accurate than spark-tfocs for solving large-scale LP problems. See page 38 for local results [here](https://open.library.ubc.ca/cIRcle/collections/ubctheses/24/items/1.0340337). Our benchmark shows that spark-tfocs is *not* suitable even for small LP problems.

## Future plans:

* Add preprocessing to capture more general LP formats.

* Add infeasibility detection.

* Extend to QP solver.

* Add GPU support, as described in page 47 [here](https://open.library.ubc.ca/cIRcle/collections/ubctheses/24/items/1.0340337), using INDArray provided in [ND4J](http://nd4j.org/) library.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ehsanmok/spark-lp

Awesome Lists containing this project

README