https://github.com/asjadnaqvi/stata-sankey
A Stata package for Sankey diagrams
https://github.com/asjadnaqvi/stata-sankey
ado package sankey stata
Last synced: 5 months ago
JSON representation
A Stata package for Sankey diagrams
- Host: GitHub
- URL: https://github.com/asjadnaqvi/stata-sankey
- Owner: asjadnaqvi
- License: mit
- Created: 2022-12-08T13:07:40.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-21T12:01:18.000Z (over 1 year ago)
- Last Synced: 2025-03-10T06:34:46.318Z (over 1 year ago)
- Topics: ado, package, sankey, stata
- Language: Stata
- Homepage:
- Size: 53 MB
- Stars: 22
- Watchers: 3
- Forks: 7
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
     
[Installation](#Installation) | [Syntax](#Syntax) | [Citation guidelines](#Citation-guidelines) | [Examples](#Examples) | [Feedback](#Feedback) | [Change log](#Change-log)
---

# sankey v1.9
(24 Jun 2025)
This package allows users to draw Sankey plots in Stata. It is based on the [Sankey Guide](https://medium.com/the-stata-guide/stata-graphs-sankey-diagram-ecddd112aca1) published on [the Stata Guide](https://medium.com/the-stata-guide) on Medium on October 2021.
## Installation
The package can be installed via SSC or GitHub. The GitHub version, *might* be more recent due to bug fixes, feature updates etc, and *may* contain syntax improvements and changes in *default* values. See version numbers below. Eventually the GitHub version is published on SSC.
SSC (**v1.9**):
```
ssc install sankey, replace
```
GitHub (**v1.9**):
```
net install sankey, from("https://raw.githubusercontent.com/asjadnaqvi/stata-sankey/main/installation/") replace
```
The `palettes` package is required to run this command:
```
ssc install palettes, replace
ssc install colrspace, replace
ssc install graphfunctions, replace
```
Even if you have these packages installed, please check for updates: `ado update, update`.
If you want to make a clean figure, then it is advisable to load a clean scheme. These are several available and I personally use the following:
```
ssc install schemepack, replace
set scheme white_tableau
```
You can also push the scheme directly into the graph using the `scheme(schemename)` option. See the help file for details or the example below.
I also prefer narrow fonts in figures with long labels. You can change this as follows:
```
graph set window fontface "Arial Narrow"
```
## Syntax
The syntax for the latest version is as follows:
```stata
sankey value [if] [in] [weight], from(var) to(var)
[ by(var) palette(str) colorby(layer|level) colorvar(var) stock stock2 colorvarmiss(str) colorboxmiss(str)
smooth(1-8) gap(num) recenter(mid|bot|top) ctitles(list) ctgap(num) ctsize(num) ctposition(bot|top)
ctcolor(str) ctwrap(num) labangle(str) labsize(str) labposition(str) labgap(str) showtotal labprop labscale(num)
valsize(str) valcondition(num) format(str) valgap(str) novalues valprop valscale(num)
novalright novalleft nolabels sort1(value|name[, reverse]) sort2(value|order[, reverse]) align fill
lwidth(str) lcolor(str) alpha(num) offset(num) boxwidth(str) percent wrap(num) * ]
```
See the help file `help sankey` for details.
The most basic use is as follows:
```
sankey value, from(var1) to(var2) [by(level)]
```
where `var1` and `var2` are source and destination variables respectively against which the `value` variable is plotted. The `by()` variable defines the levels and is optional since v1.72.
## Citation guidelines
Software packages take countless hours of programming, testing, and bug fixing. If you use this package, then a citation would be highly appreciated.
The [SSC citation](https://ideas.repec.org/c/boc/bocode/s459154.html) is recommended. Please note that the GitHub version might be newer than the SSC version.
## Examples
Get the example data from GitHub:
```stata
import excel using "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_example2.xlsx?raw=true", clear first
```
Let's test the `sankey` command:
```stata
sankey value, from(source) to(destination) by(layer)
```

### Smooth
```
sankey value, from(source) to(destination) by(layer) smooth(2)
```

```
sankey value, from(source) to(destination) by(layer) smooth(8)
```

### Re-center
```
sankey value, from(source) to(destination) by(layer) recenter(bot)
```

```
sankey value, from(source) to(destination) by(layer) recenter(top)
```

### Gaps
```
sankey value, from(source) to(destination) by(layer) gap(0)
```

```
sankey value, from(source) to(destination) by(layer) gap(20)
```

### Values
```
sankey value, from(source) to(destination) by(layer) noval showtot
```

### Sort (v1.6)
```
sankey value, from(source) to(destination) by(layer) sort1(name)
```

```
sankey value, from(source) to(destination) by(layer) sort1(value)
```

```
sankey value, from(source) to(destination) by(layer) sort1(value) sort2(value)
```

```
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(value)
```

```
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(value, reverse)
```

```
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(order)
```

```
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(order, reverse)
```

Custom sorting on a value:
```stata
gen source2 = .
gen destination2 = .
foreach x in source destination {
replace `x'2 = 1 if `x'=="Blog"
replace `x'2 = 2 if `x'=="LinkedIn"
replace `x'2 = 3 if `x'=="Twitter"
replace `x'2 = 4 if `x'=="Direct"
replace `x'2 = 5 if `x'=="App"
replace `x'2 = 6 if `x'=="Medium"
replace `x'2 = 7 if `x'=="Website"
replace `x'2 = 8 if `x'=="Homepage"
replace `x'2 = 9 if `x'=="Total"
replace `x'2 = 10 if `x'=="Google"
replace `x'2 = 11 if `x'=="Facebook"
}
lab de labels 1 "Blog" 2 "LinkedIn" 3 "Twitter" 4 "Direct" 5 "App" 6 "Medium" 7 "Website" 8 "Homepage" 9 "Total" 10 "Google" 11 "Facebook", replace
lab val source2 labels
lab val destination2 labels
sankey value, from(source2) to(destination2) by(layer)
```

### boxwidth
```
sankey value, from(source) to(destination) by(layer) boxwid(5)
```

### valcond
```
sankey value, from(source) to(destination) by(layer) valcond(200)
```

```
sankey value, from(source) to(destination) by(layer) valcond(300)
```

### Palettes
```
sankey value, from(source) to(destination) by(layer) palette(CET C6)
```

```
sankey value, from(source) to(destination) by(layer) colorby(level)
```

### color by variable (v1.4)
```
gen trace1 = 1 if source=="App"
sankey value, from(source) to(destination) by(layer) colorvar(trace1)
```

```
cap drop trace2
gen trace2 = .
replace trace2 = 1 if source=="App" & destination=="App" & layer==0
replace trace2 = 2 if source=="App" & destination=="App" & layer==1
replace trace2 = 3 if source=="App" & destination=="App" & layer==2
replace trace2 = 4 if source=="App" & destination=="Total" & layer==3
sankey value, from(source) to(destination) by(layer) colorvar(trace2)
```

```
sankey value, from(source) to(destination) by(layer) colorvar(trace2) palette(Oranges)
```

```
sankey value, from(source) to(destination) by(layer) colorvar(trace2) palette(Blues) ///
colorvarmiss(gs13) colorboxmiss(gs13)
```

```
sankey value, from(source) to(destination) by(layer) colorvar(trace2) ///
palette(blue*0.1 blue*0.3 blue*0.5 blue*0.7) colorvarmiss(gs13) colorboxmiss(gs13)
```

### column titles (v1.4)
```
sankey value, from(source) to(destination) by(layer) ctitles(Cat1 Cat2 Cat3 Cat4 Cat5)
```

```
sankey value, from(source) to(destination) by(layer) ctitles(Cat1 Cat2 Cat3 Cat4 Cat5) ctg(-100)
```

```
sankey value, from(source) to(destination) by(layer) ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctg(-100)
```

```
sankey value, from(source) to(destination) by(layer) ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctpos(top) ctg(100) recenter(top)
```

### label rotation and offset
```
sankey value, from(source) to(destination) by(layer) noval showtot palette(CET C6) ///
laba(0) labpos(3) labg(-1) offset(10)
```

### hide values and labels (v1.5)
```
sankey value, from(source) to(destination) by(layer) novalleft
```

```
sankey value, from(source) to(destination) by(layer) novalright
```

```
sankey value, from(source) to(destination) by(layer) noval
```

```
sankey value, from(source) to(destination) by(layer) nolabels
```

### proportional values and labels (v1.5)
```
sankey value, from(source) to(destination) by(layer) valprop vals(2)
```

```
sankey value, from(source) to(destination) by(layer) labprop labs(2)
```


### All together
```
sankey value, from(source) to(destination) by(layer) palette(CET C6) alpha(60) ///
labs(2.5) laba(0) labpos(3) labg(-1) offset(5) noval showtot ///
ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctg(-100) cts(3) ///
title("My sankey plot", size(6)) note("Made with the #sankey package.", size(2.2)) ///
xsize(2) ysize(1)
```

### stocks (v1.6+)
```stata
import excel using "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_stocks.xlsx?raw=true", clear first
```
```
sankey value, from(source) to(destination) by(layer) xsize(2) ysize(1) showtotal
sankey value, from(source) to(destination) by(layer) xsize(2) ysize(1) showtotal stock
sankey value, from(source) to(destination) by(layer) xsize(2) ysize(1) showtotal stock2
```

### v1.9
Load trade data by regions:
```stata
use "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/trade_sankey_example.dta?raw=true", clear
```
Generate the default Sankey:
```stata
sankey value, from(ex_region) to(im_region)
```

Add better styling using the new options in v1.9:
```stata
sankey value, from(ex_region) to(im_region) ///
format(%15.1fc) labprop smooth(8) palette(HCL intense) sort1(value) sort2(value) ///
labs(2.4) laba(0) labpos(9 3) labg(2) gap(5) noval showtot lw(none) ///
title("{fontface Merriweather Bold:Global trade in 2022 (USD millions)}", size(4)) ///
note("Source: COMTRADE BACI HS07 2022.", size(2)) ///
plotregion(margin(l+16 r+16 b+5)) ///
ctitle("{bf:Exporting region}" "{bf:Importing region}") ctwrap(8) ctgap(5) ///
xsize(2) ysize(1)
```

## Feedback
Please open an [issue](https://github.com/asjadnaqvi/stata-sankey/issues) to report errors, feature enhancements, and/or other requests.
## Change log
**v1.9 (24 Jun 2025)**
- Option `ctwrap()` added to wrap title labels.
- Option `ctgap()` now takes on values based on percentage of total height. This makes it easier to relatively displace the title labels.
- Option `labpos()` now accepts lists of positions for each layer.
- X-axis was sometimes adding additional space due to some internal tolerance limit. This has been fixed.
- Minor bug fixes.
**v1.81 (16 Oct 2024)**
- Weights are now allowed. It is still advisable to prepare the data beforehand.
- `wrap()` now requires [graphfunctions](https://github.com/asjadnaqvi/stata-graphfunctions) for label wrapping the respects word boundaries.
- Option `stock2` added that collapses stocks on the right (incoming) and removes own flows. In contrast, `stock` collapses stocks on the left (out-going).
- Various code fixes should remove additional small bugs.
**v1.8 (22 Sep 2024)**
- Added option `align` to align flows. Works only if there is just one parent (still beta).
- Added option `fill` to extrapolate missing flows. Works only if there is just one parent (still beta).
- Added option `n()` to allow users to increase the number of points for generating the arcs. Default is 30.
- Quite a large code clean up so the command should run a bit faster.
**v1.74 (11 Jun 2024)**
- Added `wrap()` option for wrapping labels.
- Minor code cleanups.
**v1.73 (16 Mar 2024)**
- If the `from()` and `to()` variables have value labels, then the order of the value labels is respected. This allows the users to have full control of the order of the drawing of the layers through value labels (requested by Katie Naylor + others).
- The command now throws an error if `from()` and `to()` have different format types. Both have to be either string or numeric variables. This was necessary to implement in order to implement the above change.
- Minor code cleanups.
**v1.72 (12 Feb 2024)**
- Fixed `labprop` from wrong calculation the label sizes.
- `valcond()` now passes on to box labels. Was removed but has been put back in.
- `by()` changed to optional. Assumes one layer if not specified. This is mostly a quality of life improvement. A warning message is displayed to ensure that `by()` is not left out by mistake.
- `ctsize()` converted to string allow size names.
- `ctcolor()` added.
- Help file improved.
- Minor code cleanups
**v1.71 (15 Jan 2024)**
- Fixed a bug where numerical `from()` and `to()` variables with value labels were messing up the labels in the final figure (reported by Ian White).
**v1.7 (06 Nov 2023)**
- Fixed `valcond()` dropping bar values.
- Fixed `ctitles()` getting random colors. It now defaults to black.
- Added `ctpos()` option to change column title position.
- Added `percent` option which is still beta. Convert flows to percent values.
**v1.61 (22 Jul 2023)**
- `saving()` option added (requested by Anirban Basu).
- Minor fixes.
**v1.6 (11 Jun 2023)**
- Complete rewrite of the base routines. The code is 30% smaller but several times faster.
- The option `sortby()` split into `sort1()` and `sort2()` for clarity.
- Added support for numerical variables with value labels.
- Option `stock` added to collapse own flows (source = destination) to box heights (requested by Oras Alabas).
- Several code optimizations and minor bug fixes.
**v1.51 (25 May 2023)**
- Added background checks for `from()` and `to()` variable. This ensures that the code runs regardless of the variable types. Ideally both should be strings.
**v1.5 (30 Apr 2023)**
- Added `laprop`, `titleprop`, and `labscale()` for scaling values and labels.
- Added `novalright`, `novalleft`, `nolabels` options.
- Added `sortby(., reverse)` option.
- Help file improved in its layout.
**v1.4 (23 Apr 2023)**
- Fixed major bugs with unbalanced panels.
- Added column title options.
- Added option to draw colors by variables.
- Several bug fixes and improvements to the code.
**v1.31 (04 Apr 2023)**
- Fixed the color of categories. Previous version was resulting in wrong color assignments.
**v1.3 (26 Feb 2023)**
- Node bundling added which align nodes in front of each other. This looks better especially if flows are passing through certain nodes.
- Option `sortby()` added that allows alphabetical sorting (`sortby(name)`) or numerical sorting `sortby(value)` (Thanks to Fabian Unterlass for detailed feedback).
- Option `boxwdith()` added to allow adjusting the width of node boxes.
**v1.21 (15 Feb 2023)**
- `valcond()` fixed.
- Error in gaps fixed.
**v1.2 (02 Feb 2023)**
- Unbalanced Sankey's are now allowed. This means that incoming and outgoing layers do not necessarily have to be equal. Outgoing can be larger than incoming.
- A category can now also start in the middle.
- Various bug fixes.
**v1.1 (13 Dec 2022)**
- Option `valformat()` renamed to just `format`. This aligns it with standard Stata usages.
- A new option `offset()` added to displace x-axis on the right-hand side. Offset is given in percentage share of x-axis range. This allows rotated labels to be displaced properly.
- Checks for missing bilateral flow combinations. Hitting a non-flow combo was causing the code to crash.
**v1.0 (08 Dec 2022)**
- Public release.