https://github.com/stared/tag-graph-map-of-stackexchange

Generates map in form of a graph from tags on StackExchange sites, e.g. StackOverflow.
https://github.com/stared/tag-graph-map-of-stackexchange

Last synced: 6 months ago
JSON representation

Generates map in form of a graph from tags on StackExchange sites, e.g. StackOverflow.

Host: GitHub
URL: https://github.com/stared/tag-graph-map-of-stackexchange
Owner: stared
Created: 2012-11-01T11:48:25.000Z (over 12 years ago)
Default Branch: master
Last Pushed: 2014-11-02T17:00:37.000Z (over 10 years ago)
Last Synced: 2025-01-11T05:33:57.339Z (6 months ago)
Language: Python
Size: 43 MB
Stars: 54
Watchers: 9
Forks: 7
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        tag-graph-map-of-stackexchange

==============================

**[Click here too see graph visualizations of StackExchange](https://github.com/stared/tag-graph-map-of-stackexchange/wiki)**.

See also: **[TagOverflow](http://stared.github.io/tagoverflow/)**.

# Development

I wrote scripts generating a map of topics from [StackExchange sites](http://stackexchange.com/sites) (e.g. [StackOverflow](http://stackoverflow.com)),

in form of a graph of tags. Started as [an entry for StackExchange visualization competition at Kaggle](https://www.kaggle.com/c/predict-closed-questions-on-stack-overflow/prospector#211).

If you like pictures, visit [wiki for this GitHub project](https://github.com/stared/tag-graph-map-of-stackexchange/wiki).

However,  if you want to read the documentation - read below.

To do:

* interactive d3js graphs

* plots for Area51

* automated plots

Current state:

* with queries from [SE Data Explorer](http://data.stackexchange.com) (but it works for any other csv tables for any other tags, as long as it is in the same form)

* with API scrapers to get tags from beta sites and to make a map of the StackExchange network

* **further development moved to [TagOverflow](http://stared.github.io/tagoverflow/)** - an interactive tag visualization in d3.js

==============================

# Usage

## Mature SE sites

data.stackexchange.com -> csv -> oetable2graphml.py -> graphml -> gephi -> pdf

- To get data, use [a data.SE query](http://data.stackexchange.com/stackoverflow/query/83415/)

to obtain table of tag [co-occurrences](http://stats.stackexchange.com/questions/40977/is-there-a-term-for-pa-cap-b-papb);

[my other queries](http://data.stackexchange.com/users/8877/piotr-migdal)

- Run oetable2graphml.py to convert it to graphml file (requires [NetworkX](http://networkx.lanl.gov/)), e.g.

python oetable2graphml.py input.csv output.graphml

- Use [Gephi](http://gephi.org) to import graphml file and process it to your taste.


E.g. (on Gephi 0.8.1 beta): 

* Overview tab:

 * Ranking -> Nodes -> Size -> weight -> Min:15, Max:30, Spline:3 -> Run 


  (optimal options may vary)

 * Layout -> ARF -> Run 


  OR: Layout -> Force Atlas -> Run; Layout -> Fruchterman Reingold -> Run 


  (and you may like to experiment with parameters or other methods)

 * Layout -> Noverlap -> Run

 * Statistics -> Modularity -> Run

 * Partition -> Nodes -> Refresh -> Modularity Class -> Apply 


  (and optionally choosing colors to your taste)

 * Font size: 26pt, Node size, Show node labels

 * Layout -> Label Adjust -> Run

* Preview tab: 

 * Nodes -> Border Color: #A0A0A0

 * Node Labels -> Show Labels: True, Font: 4pt  

 * Edges -> Opacity: 40.0

 * Refresh; Export

## Beta sites and other tags

First, obtain tag bundles with SE API, e.g. [se-api-py](https://github.com/stared/se-api-py), e.g. doing:

	x = se.fetch("questions", site="biology", filter="!nR5-WLw0-5")  # filter says that we ask only for the 'tags' field

	t = [y['tags'] for y in x]

You need to have list of list with tags per post, e.g.

	t = [["plants", "flowers"], ["plants", "carnivorous", "big-list"], ["carnivorous", "fish", "piranha"]]

Then process it e.g. in that way:

	import tag_bundle_processing as tbp

	bun = tbp.Bundle(t) 

	# or: bun = tbp.Bundle(json_path="data.json")

	bun.filter_elements(first_n=32)  # takes only 32 most frequent tags

	bun.calculate_pair_weights(self, func=oe_ratio, threshold=1.5)

	bun.export2graphml("path/to/file.graphml")

And then proceed use Gephi as for mature SE sites.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/stared/tag-graph-map-of-stackexchange

Awesome Lists containing this project

README