Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cotrino/language_KnowledgeMap

The name "KnowledgeMap" tries to use the metaphor of a cartographic map. If we represent all the different areas of knowledge as a bidimensional map, there will be shadowy unknown areas (fog of war) representing "ignorance" and some bright zones representing "knowledge".
https://github.com/cotrino/language_KnowledgeMap

Last synced: 5 days ago
JSON representation

The name "KnowledgeMap" tries to use the metaphor of a cartographic map. If we represent all the different areas of knowledge as a bidimensional map, there will be shadowy unknown areas (fog of war) representing "ignorance" and some bright zones representing "knowledge".

Awesome Lists containing this project

README

        

KnowledgeMap
===============

Motivation
----------

KnowledgeMap is a project implemented to try to answer the following questions:

1. What do I know? Get objective awareness of the subjects in which I have deeper knowledge.
This would enable to identify own's area of expertise.
E.g. Do I know more about History or Science? Do I know more about Biology or Physics?

2. What do I don't know? Get awareness of the subjects in which I have very little knowledge.
This would enable to discover further subjects to explore.
E.g. "Baseball in US" is a large subject with tons of data and trivia of which I'm completely unaware (and will keep it so).

3. Do I know more about a subject than any other particular person? Objectively compare the knowledge
of two different persons using a randomly generated quiz.

4. Which books or sources can expand my knowledge? Whenever I read a book, I need it to be not too trivial
(if I already know most of its content) and not too technical (if I lack the basis to understand large portions).
By generating the KnowledgeMap of a certain book and overlapping with own's KnowledgeMap, it would be
possible to determine whether the book fits to my own knowledge boundaries and may help to expand it,
without losing interest midway.

The name "KnowledgeMap" tries to use the metaphor of a cartographic map. If we represent all the different areas of knowledge
as a bidimensional map, there will be shadowy unknown areas ([fog of war](https://en.wikipedia.org/wiki/Fog_of_war))
representing "ignorance" and some bright zones representing "knowledge".

The first problem arises: what is "all knowledge"? For this purpose, we may use a simplified approach by saying: Wikipedia.

The second problem follows: knowledge is NOT bidimensional, but multidimensional! There are many
ways to classify knowledge and the same content could be classified within several disjoint categories at the same time.
Therefore we do another simplification here:
* We take "[Articles](https://simple.wikipedia.org/wiki/Category:Articles)" as the top category and everything follows
a hierarchy downwards from there.
* We only take the shortest path within Wikipedia Categories from an certain article to that top category
"[Articles](https://simple.wikipedia.org/wiki/Category:Articles)".

These are briefly the main ideas:

1. We take a Wikipedia dump and upload to a graph database.
2. We generate quiz questions from Wikipedia articles.
3. The user answers those questions, with either a positive or negative result.
4. Parent categories (following the shortest path to Wikipedia category "Articles") inherit those results.
5. The system generates a hierarchical heat-map visualization, with white areas representing known categories
and black areas representing unknown categories.

Questions are generated by removing one of the Wikipedia links in the article, showing some sentences around that
link to provide context and asking the user to fill in the gap. The quiz interface looks like this:
![Quiz interface](https://raw.githubusercontent.com/cotrino/language_KnowledgeMap/master/images/quiz_interface.png)

A KnowledgeMap looks like this:
![KnowledgeMap](https://raw.githubusercontent.com/cotrino/language_KnowledgeMap/master/images/KnowledgeMap.png)

An interactive demo visualization is available here: [interactive demo at cotrino.com](http://www.cotrino.com/2016/03/knowledgemap/)

There is also another visualization of the individual pages about which questions were asked:
![Page visualization](https://raw.githubusercontent.com/cotrino/language_KnowledgeMap/master/images/page_visualization.png)

The link between the user and the known (or unknown categories) is calculated with Neo4j using such a Cypher query

`MATCH (u:User)-[k:Knows]->(n:Page) WHERE id(u)=193773 WITH n,k,u MATCH path=shortestPath((a:Category)<-[r:In_Category*]-(n)) WHERE a.title='Articles' RETURN path,u LIMIT 1`

![User to Page to Articles](https://raw.githubusercontent.com/cotrino/language_KnowledgeMap/master/images/user_to_articles_path.png)

To sum up: with this approach, by now we may be able to answer previous question 1 ("what do I know?"), but not the rest yet.

As the old saying goes, now at least [I know that I know nothing](https://en.wikipedia.org/wiki/I_know_that_I_know_nothing).

Building
--------

This is a Java project built with [Maven](http://maven.apache.org).

Fetch libraries and compile JAR executable with `mvn package`.

This will generate a package including all dependencies under `target/KnowledgeMap.jar`.

Importing Data
--------------

A patched version of [Mirko Nasato's Graphipedia](https://github.com/mirkonasato/graphipedia) is used
to create a Neo4j database from a Wikipedia database dump.

See [Wikipedia:Database_download](http://en.wikipedia.org/wiki/Wikipedia:Database_download)
for instructions on getting a Wikipedia database dump. Current implementation has been successfully
tested with the [Simple English Wikipedia](https://dumps.wikimedia.org/simplewiki/).

1. Extract `simplewiki-latest-pages-articles.xml` to the folder `./data/`.

2. Download and extract [Neo4j](http://neo4j.com/) to `./database/`. Code has been tested with Neo4j 2.3.

3. Download and install [GraphAware NodeRank](https://github.com/graphaware/neo4j-noderank) as plug-in
in this Neo4j copy.

4. Run KnowledgeImporter to create a Neo4j database with nodes and relationships into `./database/data/graphipedia.db` directory.

`java -Xmx3G -classpath ./target/KnowledgeMap.jar com.cotrino.knowledgemap.KnowledgeImporter`

5. Once this is finished, you should be able to start Neo4j server with `./database/bin/neo4j start`
and access Neo4j web-based interface under http://localhost:7474/

Quiz and Visualization
----------------------

1. Start the quiz web service.

`java -classpath ./target/KnowledgeMap.jar com.cotrino.knowledgemap.KnowledgeWeb`

2. Start answering questions under http://localhost:8080/

Once you have answered a couple of questions, you can click on the respective buttons to access the visualizations.

References
----------

This project uses parts or is based on:

* [Wikipedia](https://simple.wikipedia.org/wiki/Main_Page)

* [Mirko Nasato's Graphipedia](https://github.com/mirkonasato/graphipedia)

* [Mike Bostock's Zoomable Circle Packing](https://bl.ocks.org/mbostock/7607535)

* [Neo4j](http://neo4j.com/)

* [GraphAware NodeRank](https://github.com/graphaware/neo4j-noderank)

* [D3.js](https://d3js.org/)

* [java-wikipedia-parser](https://github.com/RuedigerMoeller/java-wikipedia-parser)

* ...and several other technologies.