https://github.com/stared/tag-graph-map-of-stackexchange
Generates map in form of a graph from tags on StackExchange sites, e.g. StackOverflow.
https://github.com/stared/tag-graph-map-of-stackexchange
Last synced: 6 months ago
JSON representation
Generates map in form of a graph from tags on StackExchange sites, e.g. StackOverflow.
- Host: GitHub
- URL: https://github.com/stared/tag-graph-map-of-stackexchange
- Owner: stared
- Created: 2012-11-01T11:48:25.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2014-11-02T17:00:37.000Z (over 10 years ago)
- Last Synced: 2025-01-11T05:33:57.339Z (6 months ago)
- Language: Python
- Size: 43 MB
- Stars: 54
- Watchers: 9
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
tag-graph-map-of-stackexchange
==============================**[Click here too see graph visualizations of StackExchange](https://github.com/stared/tag-graph-map-of-stackexchange/wiki)**.
See also: **[TagOverflow](http://stared.github.io/tagoverflow/)**.
# Development
I wrote scripts generating a map of topics from [StackExchange sites](http://stackexchange.com/sites) (e.g. [StackOverflow](http://stackoverflow.com)),
in form of a graph of tags. Started as [an entry for StackExchange visualization competition at Kaggle](https://www.kaggle.com/c/predict-closed-questions-on-stack-overflow/prospector#211).If you like pictures, visit [wiki for this GitHub project](https://github.com/stared/tag-graph-map-of-stackexchange/wiki).
However, if you want to read the documentation - read below.To do:
* interactive d3js graphs
* plots for Area51
* automated plotsCurrent state:
* with queries from [SE Data Explorer](http://data.stackexchange.com) (but it works for any other csv tables for any other tags, as long as it is in the same form)
* with API scrapers to get tags from beta sites and to make a map of the StackExchange network
* **further development moved to [TagOverflow](http://stared.github.io/tagoverflow/)** - an interactive tag visualization in d3.js==============================
# Usage
## Mature SE sites
data.stackexchange.com -> csv -> oetable2graphml.py -> graphml -> gephi -> pdf
- To get data, use [a data.SE query](http://data.stackexchange.com/stackoverflow/query/83415/)
to obtain table of tag [co-occurrences](http://stats.stackexchange.com/questions/40977/is-there-a-term-for-pa-cap-b-papb);
[my other queries](http://data.stackexchange.com/users/8877/piotr-migdal)- Run oetable2graphml.py to convert it to graphml file (requires [NetworkX](http://networkx.lanl.gov/)), e.g.
python oetable2graphml.py input.csv output.graphml
- Use [Gephi](http://gephi.org) to import graphml file and process it to your taste.
E.g. (on Gephi 0.8.1 beta):* Overview tab:
* Ranking -> Nodes -> Size -> weight -> Min:15, Max:30, Spline:3 -> Run
(optimal options may vary)
* Layout -> ARF -> Run
OR: Layout -> Force Atlas -> Run; Layout -> Fruchterman Reingold -> Run
(and you may like to experiment with parameters or other methods)
* Layout -> Noverlap -> Run
* Statistics -> Modularity -> Run
* Partition -> Nodes -> Refresh -> Modularity Class -> Apply
(and optionally choosing colors to your taste)
* Font size: 26pt, Node size, Show node labels
* Layout -> Label Adjust -> Run
* Preview tab:
* Nodes -> Border Color: #A0A0A0
* Node Labels -> Show Labels: True, Font: 4pt
* Edges -> Opacity: 40.0
* Refresh; Export## Beta sites and other tags
First, obtain tag bundles with SE API, e.g. [se-api-py](https://github.com/stared/se-api-py), e.g. doing:
x = se.fetch("questions", site="biology", filter="!nR5-WLw0-5") # filter says that we ask only for the 'tags' field
t = [y['tags'] for y in x]You need to have list of list with tags per post, e.g.
t = [["plants", "flowers"], ["plants", "carnivorous", "big-list"], ["carnivorous", "fish", "piranha"]]
Then process it e.g. in that way:
import tag_bundle_processing as tbp
bun = tbp.Bundle(t)
# or: bun = tbp.Bundle(json_path="data.json")bun.filter_elements(first_n=32) # takes only 32 most frequent tags
bun.calculate_pair_weights(self, func=oe_ratio, threshold=1.5)
bun.export2graphml("path/to/file.graphml")And then proceed use Gephi as for mature SE sites.