{"id":15883220,"url":"https://github.com/cboulanger/ltkg-tools","last_synced_at":"2026-01-16T00:51:52.036Z","repository":{"id":227257716,"uuid":"770897493","full_name":"cboulanger/ltkg-tools","owner":"cboulanger","description":"Legal Theory Knowledge Graph Project - Tools and Resources","archived":false,"fork":false,"pushed_at":"2024-08-08T06:30:38.000Z","size":137,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-07T20:31:15.394Z","etag":null,"topics":["bibliometrics","corpus-linguistics","graph-algorithms","listing","nlp-machine-learning"],"latest_commit_sha":null,"homepage":"https://www.lhlt.mpg.de/2514927/03-boulanger-legal-theory-graph","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cboulanger.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-12T11:00:20.000Z","updated_at":"2024-11-29T07:28:30.000Z","dependencies_parsed_at":"2024-08-08T08:27:14.260Z","dependency_job_id":null,"html_url":"https://github.com/cboulanger/ltkg-tools","commit_stats":null,"previous_names":["cboulanger/ltkg-tools"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cboulanger%2Fltkg-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cboulanger%2Fltkg-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cboulanger%2Fltkg-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cboulanger%2Fltkg-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cboulanger","download_url":"https://codeload.github.com/cboulanger/ltkg-tools/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246763943,"owners_count":20829799,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bibliometrics","corpus-linguistics","graph-algorithms","listing","nlp-machine-learning"],"created_at":"2024-10-06T04:08:48.333Z","updated_at":"2026-01-16T00:51:52.028Z","avatar_url":"https://github.com/cboulanger.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Legal Theory Knowledge Graph Project - List of Tools and Resources\n[![DOI](https://zenodo.org/badge/770897493.svg)](https://zenodo.org/doi/10.5281/zenodo.10809341)\n\nChristian Boulanger\n\nLast major update: 2024/03/12, see commit history for individual changes since then.\n\nThe following links point to resources collected during the exploratory phase of the [Legal Theory Graph Project at the Max Planck Institute for Legal History and Legal Theory](https://www.lhlt.mpg.de/2514927/03-boulanger-legal-theory-graph). This is _not_ a comprehensive list of available resources or scholarship, but a selection based on the needs of the project and personal preferences. Most of the research for this list was done in 2021-2023, prior to the release of ChatGPT and other major language models that have revolutionized the NLP technology landscape. While some aspects of these newer developments have been incorporated, the list mostly reflects the pre-LLM state of the art.\n\nThe list differentiates between\n1. **Graph Technologies** in general, wich refer to standards and tools for collecting, editing, and presenting (visualizing) graph data;\n2. **Knowledge Organizing Systems** (KOS), which provide standards for the dissemination and reuse of information;\n3. **Data Sources** : metadata providers and source data from which metadata for node and edge data can be imported;\n4. **Corpus Lingustic Technologies** which serve to quantatively analyze full texts and can generate node and edge data from text corpora;\n5. **Bibliometry/Citation Analysis** which specifically deals with citations as research data, often using a graph\n\n## Table of Contents\n\n- [1\\. Graph Technologies](#1-Graph-Technologies \"1. Graph Technologies\")\n  - [Scholarship](#Scholarship \"Scholarship\")\n  - [Projects/Services with similar/related goals](#ProjectsServices-with-similarrelated-goals \"Projects/Services with similar/related goals\")\n  - [Projects/Services using Graph Technologies/Linked Data](#ProjectsServices-using-Graph-TechnologiesLinked-Data \"Projects/Services using Graph Technologies/Linked Data\")\n  - [Public knowledge graph / linked data repositories, data sources \u0026 services](#Public-knowledge-graph--linked-data-repositories-data-sources-amp-services \"Public knowledge graph / linked data repositories, data sources \u0026 services\")\n  - [Tools/Software](#ToolsSoftware \"Tools/Software\")\n  - [Stores for non graph-specific metadata](#Stores-for-non-graph-specific-metadata \"Stores for non graph-specific metadata\")\n- [2\\. Knowledge Organizing Systems (KOS) / Semantic Web](#2-Knowledge-Organizing-Systems-KOS--Semantic-Web \"2. Knowledge Organizing Systems (KOS) / Semantic Web\")\n  - [Theory](#Theory \"Theory\")\n  - [Organizations](#Organizations \"Organizations\")\n  - [Implementation languages](#Implementation-languages \"Implementation languages\")\n  - [Standardization hubs/initiatives/tools](#Standardization-hubsinitiativestools \"Standardization hubs/initiatives/tools\")\n  - [Specialized Ontologies / Vocabularies / Taxonomies for producing semantic metadata](#Specialized-Ontologies--Vocabularies--Taxonomies-for-producing-semantic-metadata \"Specialized Ontologies / Vocabularies / Taxonomies for producing semantic metadata\")\n  - [Software tools](#Software-tools \"Software tools\")\n- [3\\. Metadata retrieval and generation](#3-Metadata-retrieval-and-generation \"3. Metadata retrieval and generation\")\n  - [Metadata providers (with Web API)](#Metadata-providers-with-Web-API \"Metadata providers (with Web API)\")\n  - [Metadata extraction](#Metadata-extraction \"Metadata extraction\")\n  - [Web scraping](#Web-scraping)\n- [4\\. Corpus Linguistics](#4-Corpus-Linguistics \"4. Corpus Linguistics\")\n  - [General issues](#General-issues \"General issues\")\n  - [Creation \u0026 Analysis of Text Corpora](#Creation-amp-Analysis-of-Text-Corpora \"Creation \u0026 Analysis of Text Corpora\")\n  - [Text mining](#Text-mining \"Text mining\")\n  - [NLP tasks/problems](#NLP-tasksproblems \"NLP tasks/problems\")\n  - [Text annotation for machine learning](#Text-annotation-for-machine-learning \"Text annotation for machine learning\")\n  - [Possibly relevant corpora](#Possibly-relevant-corpora \"Possibly relevant corpora\")\n- [5\\. Bibliometrics / Citation Analysis](#5-Bibliometrics--Citation-Analysis \"5. Bibliometrics / Citation Analysis\")\n  - [Initiatives](#Initiatives \"Initiatives\")\n  - [Research graphs web services](#Research-graphs-web-services \"Research graphs web services\")\n  - [Scholarship](#Scholarship1 \"Scholarship\")\n  - [Automatic article corpus aggregation](#Automatic-article-corpus-aggregation \"Automatic article corpus aggregation\")\n  - [Metadata extraction/annotation software and services](#Metadata-extractionannotation-software-and-services \"Metadata extraction/annotation software and services\")\n  - [Data exchange formats](#Data-exchange-formats \"Data exchange formats\")\n  - [Matching/normalization of citations](#Matchingnormalization-of-citations \"Matching/normalization of citations\")\n  - [Applications for data analysis](#Applications-for-data-analysis \"Applications for data analysis\")\n  - [Multi-purpose libraries](#Multi-purpose-libraries \"Multi-purpose libraries\")\n\n\nSee also:\n\n- [Frey-Endres/Simon, 2021, Digitale Werkzeuge zur textbasierten Annotation, Korpusanalyse und Netzwerkanalyse in den Geisteswissenschaften](https://tuprints.ulb.tu-darmstadt.de/17850/1/Digital_Philology__Working_Papers_in_Digital_Philology_vol002.pdf)- comprehensive German language description of available tools, much more detailed and more general purpose\n- [Bibliography on Digital Research in Law](https://www.zotero.org/groups/4370759/computational_sociolegal_and_historical_legal_studies): A Zotero group, still under development\n- Discover research tools for studying texts https://tapor.ca/home\n- Awesome DHTools\n  https://dh-tech.github.io/awesome-digital-humanities/\n\n## 1. Graph Technologies\n\n### Scholarship\n\n- **Bibliographies**\n  - https://graphentechnologien.hypotheses.org/bibliographie\n  - https://www.zotero.org/groups/2224334/ag_graph/library\n  - http://historicalnetworkresearch.org/bibliography/#top\n\n- **Graphentechnologien in den Digitalen Geisteswissenschaften**\n  http://doi.org/10.1515/abitech-2017-0042\n\n- **ZfdG** - Zeitschrift für digitale Geisteswissenschaften\n  https://zfdg.de\n\n### Projects/Services with similar/related goals\n\nOf interest are projects that cover the humanities/social sciences and which generate citation and/or other graph data (including German language scholarship)\n\n- **Linked Open Citation Database**: Development of a Linked Open Data database for the indexing of citations of electronic and print media\n  \u003chttps://locdb.bib.uni-mannheim.de/blog/en/\u003e\n\n- **EXCITE/OUTCITE**: aims to extract citations from social science publications and to make more citation data available to researchers\n  \u003chttps://excite.informatik.uni-stuttgart.de\u003e\n\n- **GEOCITE**: Monitoring-Tool zur Wissenschaftsbeobachtung, based on EXCITE\n  https://geographische-netzwerkstatt.uni-passau.de/de/geocite\n\n- **Scholia** is a service that creates visual scholarly profiles for topics, people, organizations, species, chemicals, etc using bibliographic and other information in Wikidata.\n  \u003chttps://scholia.toolforge.org\u003e\n\n- **PhiWiki** ist eine sich in der Entwicklung befindliche Software-Anwendung, die es zunächst Philosoph:innen, prinzipiell aber auch anderen Geisteswissenschaftler:innen, ermöglichen soll, Daten zu den Ideen und Begriffen ihrer Disziplin semantisch zu erfassen und neue Verbindungen innerhalb dieser Daten zu entdecken\n  https://zenodo.org/records/8386456\n\n### Public knowledge graph / linked data repositories, data sources \u0026 services\n\n- **Wikidata**\n  \u003chttps://www.wikidata.org/wiki/Wikidata:Introduction\u003e\n- \n- **Open Research Knowledge Graph**\n  \u003chttps://projects.tib.eu/orkg\u003e\n  \u003chttps://gitlab.com/TIBHannover/orkg\u003e\n\n- **OpenAIRE Research Graph**\n  \u003chttps://graph.openaire.eu/about\u003e\n\n- **Druid, the place to store, share and query Linked Data**\n  \u003chttps://druid.datalegend.net\u003e\n\n- **Histograph**: Graph-based exploration and crowd-based indexation for multimedia collections\n  \u003chttp://histograph.eu\u003e\n\n- **DBpedia**: extracts structured content from Wikimedia projects as an open knowledge graph\n  \u003chttps://www.dbpedia.org/about\u003e\n\n- **Palladio**: Visualization of complex historical data\n  \u003chttps://hdlab.stanford.edu/palladio\u003e\n\n- **Geovistory Toolbox**: Store, visualize and share historical and geographical data  \n  \u003chttps://www.geovistory.com\u003e\n\n- **FRED**: web service \u0026 API that automatically extracts rich and highly connected linked data from a text\n  \u003chttp://wit.istc.cnr.it/stlab-tools/fred/\u003e\n\n- **Open Knowledge Maps**: A visual interface to the world's scientific knowledge\n  https://openknowledgemaps.org/\n\n\n### Tools/Software\n\nFor tools for scientometric/bibliometric graph data, see [section on general bibliometric technologies](#5-Bibliometrics--Citation-Analysis)\n\n#### Overviews and comparisons\n\n- Besta et al. (2021) Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries\n  https://arxiv.org/abs/1910.09017\n\n- Elise Devaux (2019) List of free graph visualization applications\n  https://elise-deux.medium.com/list-of-free-graph-visualization-applications-9c4ff5c1b3cd\n\n- Elise Devaux (2019) List of graph visualization libraries\n  https://elise-deux.medium.com/the-list-of-graph-visualization-libraries-7a7b89aab6a6\n\n\n#### Graph databases, editors, stores, analysis \u0026 visualization\n\n- **Neo4j Desktop** for graph data \u0026 queries (Commercial with community edition)\n  \u003chttps://www.neo4j.com\u003e\n  - Connection to R/RStudio \u003chttps://github.com/neo4j-rstats/neo4r\u003e\n  - Visualization libraries:\n      - https://neo4j.com/developer/tools-graph-visualization/\n      - https://github.com/neo4j-contrib/neovis.js/\n\n- **Open Native Graph Database**: An Open Source fork of Neo4J\n  \u003chttps://www.graphfoundation.org/\u003e\n\n- **GraphStack**: Builds on ONgDB to provide enterprise solutions\n  \u003chttps://graphstack.io/\u003e\n\n- **Cayley Graph Database** (Open Source)\n  \u003chttps://cayley.gitbook.io\u003e\n\n- **NodeGoat**: a web-based research environment for the humanities\n  \u003chttps://nodegoat.net\u003e\n\n- **VIVO**: creates a knowledge graph of the scholarly work of an organization\n  \u003chttps://duraspace.org/vivo/\u003e\n\n#### Visualization software/platforms\n\n- **Gephi**: Open Graph Vizualization\n  https://gephi.org/\n\n- **Cytoscape**: open source software platform for complex network analysis and visualization (originally designed for biological research).\n  https://cytoscape.org/\n\n- **ArcadeDB**: Open Source Graph Visualization Tool. Integrates  with Neo4j\n  https://arcadedb.com/analytics\n\n- **Constellation**: free open source software for data visualisation \u0026 analytics\n  https://www.constellation-app.com/\n\n- **Graph Commons**: Transform your data into interactive maps (freemium)\n  https://graphcommons.com/\n\n- **GraphVis**: platform for interactive visual graph mining and relational learning.\n  https://networkrepository.com/graphvis.php\n\n- **GrapViz**: graph visualization software based on a graph representation language (DOT).\n  \u003chttps://graphviz.org/\u003e\n  - https://www.tonyballantyne.com/graphs.html\n\n- **GUESS**: exploratory data analysis and visualization tool for graphs and networks.\n  http://graphexploration.cond.org\n\n- **Apache Zeppelin**: Web-based notebook that enables data-driven,\n  interactive data analytics and collaborative documents with SQL, Scala, Python, R and more\n\n- **draw.io** Online Graph Editor (auch als Offline-App)\n  https://www.diagrams.net/\n  https://github.com/jgraph/drawio-desktop/\n\n#### Methodological questions\n\n##### Community detection\n\n- https://www.r-bloggers.com/2020/03/community-detection-with-louvain-and-infomap/\n\n##### Temporality\n\n- https://kateto.net/polnet2017.html (scroll down to \"7.2 Network evolution animations\")\n- https://github.com/michalgm/ndtv-d3/blob/master/README.md\n- https://cran.r-project.org/web/packages/networkDynamic/vignettes/networkDynamic.pdf\n\n#### Software libraries/APIs\n\n- **igraph**: a collection of network analysis tools for R, Python and C\n  \u003chttps://igraph.org\u003e\n\n##### Python\n\n- **NetworkX**: Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks\n  https://networkx.org/\n\n- **graph-tool**: Python package for network analysis and visualization with C++ backend \u003chttps://graph-tool.skewed.de/\u003e\n\n- **WebWeb**: tool for creating, displaying, and sharing interactive network visualizations on the web\n  https://webwebpage.github.io/\n\n- **openmappr**: visually browse and discover patterns in networks\n  https://www.openmappr.org\n\n- **pyvis**\n  https://pyvis.readthedocs.io\n\n  - https://www.askpython.com/python/examples/customizing-pyvis-interactive-network-graphs\n\n- **ipycytoscape**\n  https://blog.jupyter.org/interactive-graph-visualization-in-jupyter-with-ipycytoscape-a8828a54ab63\n\n##### JavaScript\n\n- **D3**: JavaScript library for manipulating documents based on data, useful for graph visualization\n  https://d3js.org/\n\n- **vis.js community edition**: A dynamic, browser based visualization library.\n  https://visjs.org/\n  - https://visjs.github.io/vis-network/examples/\n\n- **Dracula.js** is a set of tools to display and layout interactive connected graphs and networks, along with various related algorithms from the field of graph theory.\n  https://www.graphdracula.net/\n\n- **Cytoscape.js**: Javascript visualization of Cytoscape data\n  https://js.cytoscape.org/\n\n- **sigma.js**: a JavaScript library aimed at visualizing graphs of thousands of nodes and edges\n  https://www.sigmajs.org/\n\n- **ccNetViz**: is a lightweight, high-performance javascript library for large network graphs (see graph theory) visualization using WebGL\n  https://helikarlab.github.io/ccNetViz/\n\n##### R\n\n- https://datastorm-open.github.io/visNetwork/\n- https://briatte.github.io/ggnet/\n- https://mr.schochastics.net/material/netVizR/\n- https://r-graph-gallery.com/network.html\n- https://r-graph-gallery.com/network-interactive.html\n- https://www.r-bloggers.com/2019/06/interactive-network-visualization-with-r/\n  - https://r-graph-gallery.com/257-input-formats-for-network-charts.html\n  - https://www.statworx.com/en/content-hub/blog/interactive-network-visualization-with-r/\n- https://github.com/michalgm/ndtv-d3/blob/master/README.md\n\n\n##### Java\n\n- **Apache Jena**: A free and open source Java framework for building Semantic Web and Linked Data applications\n  https://jena.apache.org\n\n### Stores for non graph-specific metadata\n\nThe following list is specific to this project, not an exhaustive overview.\n\n- **Zotero**: app for managing bibliographic data, nice UI + Web API, slow, poor query features\n  https://www.zotero.org\n  - Web API/Clients\n      https://www.zotero.org/support/dev/web_api/v3/start\n  - ZotPrime: a fully packaged on-premise solution:\n      \u003chttps://github.com/ZotPrime/zotprime\u003e\n  - Cita: a Wikidata addon for Zotero with citations metadata support\n      \u003chttps://github.com/diegodlh/zotero-cita\u003e\n\n- **Endatabas**: Open Source SQL Document Database with Full History, allowing to record data change over time. Still in beta\n  https://www.endatabas.com/\n\n- **Couchbase**: JSON document database: excellent query support, many clients\n  https://docs.couchbase.com/tutorials/getting-started-ce/install-manage/tutorial_en.html\n  - Couchbase Store for @retorquere/zotero-sync\n      https://github.com/cboulanger/zotero-sync-couchbase\n\n\n\n## 2. Knowledge Organizing Systems (KOS) / Semantic Web\n\n### Theory\n\n- [Lists, Taxonomies, Lattices, Thesauri and Ontologies: Paving a Pathway Through a Terminological Jungle](https://www.ergon-verlag.de/isko_ko/downloads/ko_41_2014_3_d.pdf)\n- [Ontologies (as knowledge organization systems)](https://www.isko.org/cyclo/ontologies)\n\n### Organizations\n\n- Kompetenzzentrum Interoperable Metadaten (KIM)\n  https://dini.de/standards\n\n### Implementation languages\n\n- **Resource Description Framework (RDF)**\n  \u003chttps://www.w3.org/TR/rdf-concepts/\u003e\n  \u003chttps://www.w3.org/OWL/\u003e\n\n- **Terse RDF Triple Language**, a concrete syntax for RDF\n  \u003chttps://www.w3.org/TR/turtle/\u003e\n\n- **Web Ontology Language (OWL)**\n  \u003chttps://www.w3.org/TR/owl-ref/\u003e\n\n- **Shapes Constraint Language (SHACL)**: validation of RDF ontologies\n  \u003chttps://www.w3.org/TR/shacl/\u003e\n\n- **Simple Knowledge Organization System (SKOS)**\n  https://www.w3.org/2009/08/skos-reference/skos.html\n  https://www.w3.org/2006/07/SWD/SKOS/skos-and-owl/master.html\n  https://www.w3.org/2004/02/skos/vocabs\n\n### Standardization hubs/initiatives/tools\n\n- **SkoHub**: KOS-based content subscription (for structural metadata such as taxonomies)\n  \u003chttps://skohub.io\u003e\n\n- **RDA Registry**: contains vocabularies that represent the RDA entities, elements, and controlled terminologies\n  \u003chttp://www.rdaregistry.info\u003e\n\n- **Dariah Vocabs Services**\n  \u003chttps://vocabs.dariah.eu/en/\u003e\n\n-  **Standardization Survival Kit**: A collection of research use case scenarios illustrating best practices in Digital Humanities and Heritage research\n   \u003chttp://ssk.huma-num.fr\u003e\n\n- **PARTHENOS Virtual Research Environment**: integrates cloud storage with services and tools for Digital Humanities\n  \u003chttps://parthenos.d4science.org/web/parthenos_vre\u003e\n  - [Training Suite](https://training.parthenos-project.eu): provides training modules and resources in DH\n\n- **Linked Pipes**: Registry of web-based linked data services based on WikiData\n  \u003chttp://linkedpipes.xyz/\u003e\n\n### Specialized Ontologies / Vocabularies / Taxonomies for producing semantic metadata\n\n- Example from legal history: Regulatory Matters of Police Ordinances\n  https://github.com/rg-mpg-de/vocabs-polmat\n\n#### Bibliographic Data\n\n- **The Bibliographic ontology (BIBO)**\n  https://bibliontology.com/\n\n- **FRBR-aligned Bibliographic Ontology (FaBiO)**\n  http://www.sparontologies.net/ontologies/fabio#fabio_3\n\n- **Comparison of FaBiO and BIBO**\n  https://opencitations.wordpress.com/2011/06/29/comparison-of-bibo-and-fabio/\n\n- **Categorising bibliographic resources with FaBiO and SKOS**\n  https://opencitations.wordpress.com/2011/06/29/categorising-bibliographic-resources-with-fabio-and-skos/\n\n- **BIBO2SPAR**: RDF Mapping of BIBO to the SPAR Ontologies\n  https://opencitations.wordpress.com/2011/06/29/bibo2spar-an-rdf-mapping-of-bibo-to-the-spar-ontologies/\n\n- **Citation Typing Ontology (CiTO)**\n  https://sparontologies.github.io/cito/current/cito.html\n\n- **OpenCitations Data Model** (referencing many pertinent ontologies)\n  https://opencitations.net/model\n  [article (2020)](https://doi.org/10.1007/978-3-030-62466-8_28)\\]\n\n- **Web of Science**\n  https://images.webofknowledge.com/images/help/WOS/hs_wos_fieldtags.html\n\n#### Scholars\n\n- **GND Ontology**\n  \u003chttps://d-nb.info/standards/elementset/gnd\u003e ([WebVOWL](http://visualdataweb.de/webvowl/#iri=https://d-nb.info/standards/elementset/gnd_20191015.rdf))\n\n- **Scholarly Ontology**\n  https://scholarlyontology.herokuapp.com/\n\n- **FOAF ontology**\n  http://xmlns.com/foaf/spec/\n\n- **SCoRO, the Scholarly Contributions and Roles Ontology**\n  https://sparontologies.github.io/scoro/current/scoro.html\n\n- **RELATIONSHIP: A vocabulary for describing relationships between people**\n  https://vocab.org/relationship/\n\n- **BIO: A vocabulary for biographical information**\n  https://vocab.org/bio/\n\n#### Organizations\n\n- The Organization Ontology\n  https://www.w3.org/TR/vocab-org\n\n### Software tools\n\n#### Web-based\n\n- Skosmos: Open source web-based SKOS browser and publishing tool\n  \u003chttp://skosmos.org/\u003e ([Demo](http://skosmos.dev.finto.fi/en/))\n\n- DARIAH Vocabs Editor \u003chttps://github.com/acdh-oeaw/vocabseditor\u003e\n\n- DARIAH Vocabs API Server \u003chttps://vocabs-api.acdh.oeaw.ac.at/\u003e\n\n- Mobi: platform which links native data sources into a knowledge graph, features Ontology editor\n  \u003chttps://mobi.inovexcorp.com/docs\u003e\n\n- Vitro: Vitro is a general-purpose web-based ontology and instance editor with customizable public     browsing, part of VEVO\n  \u003chttps://github.com/vivo-project/Vitro\u003e\n\n- WebVOWL: Web-based Visualization of Ontologies\n  \u003chttp://vowl.visualdataweb.org/webvowl.html\u003e  ([Example](http://visualdataweb.de/webvowl/#iri=https://raw.githubusercontent.com/athenarc/scholarly-ontology/main/ScholarlyOntology_Schema_and_ActyivityTypes_v1.3.owl))\n\n- RDFShape: web service offering RDF format conversion, validation, querying and OWL inference\n  \u003chttps://rdfshape.weso.es/\u003e\n\n#### Desktop Applications\n\n- RDF Studio: RDF Vocabulary Writer For Windows\n  http://www.linkeddatatools.com/rdf-studio\n\n- Protégé Ontology Editor\n  https://protegeproject.github.io/protege/\n\n\n#### Libraries/Specifications/Tools\n\n- Linked Data\n  https://github.com/digitalbazaar/jsonld.js\n\n- Graphical Framework for OWL Ontologies\n  https://essepuntato.it/graffoo/\n  https://essepuntato.it/graffoo/specification/\n\n- Turtle Editors (Online/Desktop)\n  https://marketplace.visualstudio.com/items?itemName=markstoehr.skos-ttl-editor\n  https://perfectkb.github.io/yate/\n  http://onto.fel.cvut.cz/turtle-editor/turtle-editor.html\n\n- Ontologics: R package to handle Ontologies\n  https://cran.r-project.org/web/packages/ontologics\n\n## 3. Metadata retrieval and generation\n\n### Metadata providers (with Web API)\n\n#### General\n\n- **Wikidata**: a free, collaborative, multilingual, secondary database, collecting structured data to provide support for Wikipedia, Wikimedia Commons, the other wikis of the Wikimedia movement, and to anyone in the world.\n  \u003chttps://www.wikidata.org\u003e\n\n  - [Introduction](https://www.wikidata.org/wiki/Wikidata:Introduction)\n  - **Projects/Initiatives**:\n      - [WikiProject Source Metadata](https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData)\n      - [WikiCite intiative](https://meta.wikimedia.org/wiki/WikiCite)\n        - [Roadmap (\u0026 scaling prolems)](https://www.wikidata.org/wiki/Wikidata:WikiCite/Roadmap)\n      - [LD4 Wikidata Affinity Group](https://www.wikidata.org/wiki/Wikidata:WikiProject_LD4_Wikidata_Affinity_Group)\n\n- **API clients/Tools**:\n  Programmatic WikiData edits should be made with a bot account, see https://www.wikidata.org/wiki/Wikidata:Bots\n  - CLI:\n      - https://github.com/maxlath/wikibase-cli\n  - Python:\n      - https://github.com/LeMyst/WikibaseIntegrator\n      - https://github.com/SuLab/WikidataIntegrator\n      - https://github.com/andrewtavis/wikirepo\n  - JavaScript/NodeJS\n      - https://github.com/maxlath/wikibase-sdk\n      - https://github.com/maxlath/wikibase-edit\n  - R:\n      - https://github.com/TS404/WikidataR\n  - Ruby:\n      - https://github.com/wilg/wikidata\n\n- **Data model**:\n  - [Bibliographic Properties](https://www.wikidata.org/wiki/Template:Bibliographic_properties)\n  - [Property \"cites work\"](https://www.wikidata.org/wiki/Property_talk:P2860)\n\n- **OpenAlex**: An open and comprehensive catalog of scholarly papers, authors, institutions, and more\n  https://docs.openalex.org/api\n\n- **OpenAire Knowledge Graph**: maps the Scholarly Communication Knowledge Model: collection of interlinked descriptions of concepts, entities, relationships and events\n  https://graph.openaire.eu/what-is-the-openaire-graph\n\n- **SemOpenAlex**: Scholarly knowledge graph with over 26 billion RDF triples built upon OpenAlex:\n  https://semopenalex.org\n\n- **The General Index**:\n  - https://archive.org/details/GeneralIndex\n  - https://www.nature.com/articles/d41586-019-02142-1\n\n#### Articles / Citations / Bibliometric data\n\n- **OpenAlex**: An open and comprehensive catalog of scholarly papers, authors, institutions, and more\n  https://docs.openalex.org/api\n\n- **Semantic Scholar**: A free, AI-powered research tool for scientific literature\n  - API: https://www.semanticscholar.org/product/api#Documentation\n  - Snapshot: https://api.semanticscholar.org/corpus\n\n- **Open Academic Graph**: large knowledge graph unifying two billion-scale academic graphs: Microsoft Academic Graph (MAG) and AMiner (large snapshot, no API)\n  https://www.aminer.cn/oag\n\n- **OpenCitations**\n  \u003chttps://opencitations.net\u003e\n  - [API](https://opencitations.net/index/api/v1)\n  - [Article (2020)](https://doi.org/10.1162/qss_a_00023)\n  - Initiative for Open Citations\n      \u003chttps://i4oc.org/\u003e\n  - CROCI, the Croudsourced Open Citations Index (for items with DOI)\n      https://opencitations.net/index/croci\n\n- **CrossRef**\n  https://www.crossref.org\n  - [API](https://www.crossref.org/education/retrieve-metadata/rest-api/)\n  - [NodeJS API Client](https://www.npmjs.com/package/crossref)\n  - [SimpleTextQuery](https://doi.crossref.org/simpleTextQuery)\n\n- **Unpaywall**: An open database of free scholarly articles\n  - API: https://unpaywall.org/products/api\n  - Python client: https://pypi.org/project/unpywall/\n\n- **Internet Archive Scholar**\n  \u003chttps://scholar.archive.org/\u003e\n\n  - **Fatcat**: a scalable, versioned, API-oriented catalog of bibliographic entities and file metadata.\n      \u003chttps://api.fatcat.wiki/redoc\u003e\n\n  - **Refcat**, the Internet Archive Scholar Scholar Index\n      - Info: \u003chttp://blog.archive.org/2021/10/19/internet-archive-releases-refcat-the-ia-scholar-index-of-over-1-3-billion-scholarly-citations/\u003e\n\n\n- **Web of Science** (commercial, requires license)\n  - https://pypi.org/project/wos/\n  - https://developer.clarivate.com/apis/wos\n  - https://github.com/rafguns/wosfile\n\n- **Scopus**: Like Web of Science, but from Elsevier\n  https://www.scopus.com\n\n\n\n- **Google Scholar** (no API, but with scraping libraries, which often break. Google actively prevents scraping)\n  - https://pypi.org/project/scholarly/\n\n#### Books\n\n- **OpenLibrary**\n  - https://openlibrary.org/dev/docs/api/books\n  - https://github.com/jayfajardo/openlibrary\n\n- **Google Books API**\n  - https://developers.google.com/books/docs/v1/using\n  - https://medium.com/@akramhelil/google-books-api-with-rails-or-ruby-a931cece427a\n\n- **Share-VDE: linked data for libraries**\n  https://wiki.share-vde.org/wiki/Main_Page\n\n- **WorldCat Search API (commercial)**\n  - \u003chttps://www.oclc.org/developer/develop/web-services/worldcat-search-api.en.html\u003e\n  - \u003chttps://platform.worldcat.org/api-explorer/apis/wcapi\u003e\n\n\n##### National Libraries\n\n- **Deutsche Nationalbibliothek (DNB)**\n  - [DNB Linked Data](https://www.dnb.de/DE/Professionell/Metadatendienste/Datenbezug/LDS/lds_node.html)\n  - [DNB SRU Interface](https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/SRU/sru_node.html)\n      - [EXPLAIN XML](https://services.dnb.de/sru/dnb?operation=explain\u0026version=1.1)\n      - [Schnittstellen (PDF)](https://www.dnb.de/SharedDocs/Downloads/DE/Professionell/Metadatendienste/linkedDataZugriff.pdf?__blob=publicationFile\u0026v=3)\n\n- lobid.org : provides Linked Open Data (LOD) services\n  - [GND](https://lobid.org/gnd/api)\n  - [hbz](https://lobid.org/resources)\n\n- Library of Congress SRU\n  \u003chttps://www.loc.gov/standards/sru/\u003e\n\n- https://blog.ldodds.com/2014/10/08/accessing-the-british-national-bibliography-using-sparql/\n\n\n#### Scholars\n\n- VIAF: Virtual International Authority File (Persons)\n  \u003chttps://viaf.org/\u003e\n\n  Clients\n  - R: https://rdrr.io/cran/viafr/\n  - PHP: https://packagist.org/packages/gbv/viaf-jskos\n  - NodeJS: https://github.com/phette23/viaf-npm\n\n- DNB Normdaten, via dariah.eu (Persons)\n  [https://wiki.de.dariah.eu](https://wiki.de.dariah.eu/display/publicde/DARIAH-DE+Normdatendienste#DARIAHDENormdatendienste-GemeinsameNormdatei(GND))\n\n- GND via (http://lobid.org/gnd) (Linked Open Data API)\n  OpenRefine interface:\n  https://lobid.org/gnd/reconcile\n  https://blog.lobid.org/2018/08/27/openrefine.html\n\n- orcid.org API\n  \u003chttps://info.orcid.org/documentation/integration-guide/registering-a-public-api-client/\u003e\n\n- Google Scholar (only scholar-maintained info, see below)\n  \u003chttps://scholar.google.com\u003e\n\n- ISNI: global standard number for contributors to creative works and those active in their distribution\n  - Online Search: \u003chttps://isni.oclc.org/\u003e\n  - Linked Data: \u003chttps://isni.org/page/linked-data\u003e\n\n#### Venues/Journals/Sources\n\n- ZDB (Zeitschriftendatenbank)\n  - API: https://zeitschriftendatenbank.de/api\n  - via lobig.org: https://blog.lobid.org/2018/09/04/zdb.html\n-\n- https://docs.openalex.org/api-entities/sources\n\n#### Institutions\n\n- Research Organization Registry\n  \u003chttps://ror.org\u003e\n\n- ISNI: global standard number for contributors to creative works and those active in their distribution\n  - Online Search: \u003chttps://isni.oclc.org/\u003e\n  - Linked Data: \u003chttps://isni.org/page/linked-data\u003e\n\n\n\n### Metadata extraction\n\n#### End-to-end solutions\n\n- annif. Tool for automated subject indexing and classification of documents\n  https://annif.org/\n\n#### OCR Software\n\n- OCR-D: Complete Open Source OCR-Workflow for Libraries \u0026 Archives, using multiple OCR engines\n  https://ocr-d.de/en/use\n\n- OCR4All: OCR as a Web application (comes as a Docker image, intended for smaller \u0026 mainly historical projects)\n  https://github.com/OCR4all/docker_image\n\n- Tesseract Open Source OCR engine\n  https://github.com/tesseract-ocr/tesseract\n\n  - Tesseract to PAGE: analyse document page with Tesseract and convert to PAGE XML format\n      https://www.primaresearch.org/tools/TesseractOCRToPAGE\n  - TesseractXplore: a graphical interface to tesseract\n      \u003chttps://github.com/JKamlah/tesseractXplore\u003e\n  - Clients\n      https://github.com/zapolnoch/node-tesseract-ocr\n      https://www.npmjs.com/package/node-ts-ocr\n\n\n- **Apache Tika** - a content analysis toolkit, including Tesseract OCR\n  https://tika.apache.org/\n  https://medium.com/@masreis/text-extraction-and-ocr-with-apache-tika-302464895e5f\n\n- **PDFSandwich**: combines tesseract, convert, unpaper\n  http://www.tobias-elze.de/pdfsandwich/\n\n- **OCRmyPDF**: adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched.\n  https://ocrmypdf.readthedocs.io/en/latest\n\n- **Kreuzberg OCR**: High-performance, lightweight Python library for text extraction from documents. Extract text from PDFs, images, office documents, and more with both async and sync APIs.\n  https://github.com/Goldziher/kreuzberg\n\n- **Abbyy Cloud OCR **: commercial OCR service, expensive but very good results for modern texts\n  https://cloud.ocrsdk.com\n  https://github.com/cboulanger/abbyy-cloud-ocr\n\n#### Other OCR engines/applications (mainly for historical or handwritten documents)\n\n- **eScriptorium**: A project providing digital recognition of handwritten documents using machine learning techniques.\n  https://escriptorium.fr/\n\n- **Transkribus**: platform for the digitisation, AI-powered text recognition, transcription and searching of historical documents\n  https://readcoop.eu/de/transkribus/\n\n- **Kraken**: turn-key OCR system\n  http://kraken.re/\n\n#### OCR Workflows\n\n- Creating an OCR Workflow (Post-Processing)\n  https://github.com/ithaka/constellate-notebooks/blob/master/OCR/ocr-workflow-2.ipynb\n\n#### Document Layout Description\n\n- **Analyzed Layout and Text Object (ALTO)**: XML Schema for describing the layout and content of physical text\n  resources, such as pages of a book or a newspaper.\n  https://www.loc.gov/standards/alto/\n  https://github.com/Mewel/abbyy-to-alto\n  https://github.com/ironymark/AbbyyToAlto\n\n- **Page Analysis and Ground-Truth Elements (PAGE)**\n  http://www.primaresearch.org/publications/ICPR2010_Pletschacher_PAGE\n  https://www.primaresearch.org/tools/PAGEViewer\n  https://github.com/PRImA-Research-Lab/PAGE-XML\n  https://github.com/PRImA-Research-Lab/prima-page-converter\n\n- **ocr-fileformat**: Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader)\n  https://github.com/UB-Mannheim/ocr-fileformat\n\n- https://github.com/UB-Mannheim/crass\n- https://pikepdf.readthedocs.io/en/latest/\n\n#### OCR Postprocessing (Spellchecking / Error Correction)\n\n- Survey of Automatic Spelling Correction\n  https://www.mdpi.com/2079-9292/9/10/1670/pdf\n\n- Automatic evaluation of OCR quality, using https://github.com/saffsd/langid.py\n  https://ryanfb.github.io/etc/2015/03/16/automatic_evaluation_of_ocr_quality.html\n  https://gist.github.com/cboulanger/cb4a99f7e03fb86141e511f15e3cfc5e (Implementation)\n\n- **Pocoweb**: Platform for manual and semi-automatic postcorrection\n  https://github.com/cisocrgroup/pocoweb (Browser-based, docker/server)\n\n- **LanguageTool**: Style and Grammar Checker for 25+ Languages (server-based)\n  https://github.com/languagetool-org/languagetool\n\n\n#### Document ingestion (content structure analysis)\n\n- **pub2tei**: A set of style sheets for converting XML documents encoded in various scientific publisher formats (such as JATS) into a common TEI format\n  https://github.com/kermitt2/Pub2TEI\n\n\n#### PDF text extraction \u0026 manipulation\n\n- PyMuPDF: high-performance Python library for data extraction, analysis, conversion \u0026 manipulation of PDF (and other) documents.\n  https://pymupdf.readthedocs.io\n\n- PDF editing libraries for Node\n  https://www.npmjs.com/package/scissors\n\n- Fixing page numbers\n  https://github.com/lovasoa/pagelabels-py\n\n#### Metadata enhancement / correction / annotation\n\n- OpenRefine (to clean up, normalize/enrich existing data and reconcile with authority databases)\n  https://openrefine.org/\n  OpenRefine command line tool: https://github.com/opencultureconsulting/orcli\n\n- Recogito: Collaborative Semantic Annotation\n  https://recogito.pelagios.org/\n\n### Web Scraping\n\n- **ParseHub**: a free and powerful web scraping tool.\n  https://www.parsehub.com/\n\n- **Hyphe Browser**: desktop application which consists of a web browser to build a web corpus while visualizing the pages of the websites so that the user can curate and categorize them easily.\n  https://medialab.sciencespo.fr/en/tools/hyphe-browser/\n\n- **Issue Crawler**: network mapping software that crawls specified sites and captures the outlinks from the specified sites\n  http://www.govcom.org/Issuecrawler_instructions.htm\n\n- **minet**: a webmining command line tool \u0026 library for python (\u003e= 3.7) that can be used to collect and extract data from a large variety of web sources such as raw webpages, Facebook, CrowdTangle, YouTube, Twitter, Media Cloud etc.\n  https://github.com/medialab/minet\n\n## 4. Corpus Linguistics\n\n### General issues\n\n- Graduate course on Corpus Linguistics\n  https://alvinntnu.github.io/NTNU_ENC2036_LECTURES/\n\n- List of Tools for Corpus Linguistics\n  https://corpus-analysis.com/\n\n- Computerlinguistische Werkzeuge zur Erschließung und Exploration großer Textsammlungen aus der Perspektive fachspezifischer Theorie\n  https://zfdg.de/sb001_013\n\n- Text und Data Mining mit urheberrechtlich geschützten Textbeständen\n  https://zfdg.de/2020_006\n\n- Text Preprocessing for NLP and Machine Learning Tasks (terms and workflows)\n  https://medium.com/sciforce/text-preprocessing-for-nlp-and-machine-learning-tasks-3e077aa4946e\n\n### Creation \u0026 Analysis of Text Corpora\n\n#### Software\n\n- R-Studio (IDE), using Corpus Linguistics packages\n  https://www.rstudio.com/\n\n- MALLET: semi-automated Topic modeling analysis\n  http://mallet.cs.umass.edu/topics.php\n  Tutorial: https://programminghistorian.org/en/lessons/topic-modeling-and-mallet\n\n- CorpusExplorer (free, Windows/Mono)\n  https://notes.jan-oliver-ruediger.de/software/corpusexplorer-overview/\n  https://github.com/notesjor/CorpusExplorer.Terminal.Console\n\n- WordCruncher (free, Windows/Mac)\n  https://www.wordcruncher.com\n\n- Open Semantic Search Server\n  https://www.opensemanticsearch.org\n\n#### Services\n\n- Constellate: builds corpora from JSTOR data\n  https://constellate.org/builder/\n\n- Unpaywall API\n  https://unpaywall.org/products/api\n\n### Text mining\n\n### NLP tasks/problems\n\n#### Keyword extraction\n\n- https://www.analyticsvidhya.com/blog/2022/03/keyword-extraction-methods-from-documents-in-nlp/\n\n#### Text classification\n\n- https://towardsdatascience.com/text-classification-with-state-of-the-art-nlp-library-flair-b541d7add21f\n- http://ethen8181.github.io/machine-learning/deep_learning/multi_label/fasttext.html\n\n#### Document segmentation\n- **PDF Document Segmentation Application**: browser-based tool for segmenting non-OCRed PDFs into individual, machine-readable text files\n  https://github.com/lizfischer/document-segmentation\n- https://towardsdatascience.com/nlp-splitting-text-into-sentences-7bbce222ef17\n- https://spacy.io/usage/linguistic-features#sbd\n\n#### Topic Modelling\n\n- https://towardsdatascience.com/topic-modeling-with-lsa-plsa-lda-nmf-bertopic-top2vec-a-comparison-5e6ce4b1e4a5\n- https://keyatm.github.io/keyATM/index.html\n- https://github.com/koheiw/seededlda\n\n#### Libraries\n\n##### Python\n\n- **Comparison**\n  - https://medium.com/activewizards-machine-learning-company/comparison-of-top-6-python-nlp-libraries-c4ce160237eb\n\n- **TM**\n  https://tmtoolkit.readthedocs.io\n- **NLTK**\n  https://www.nltk.org\n- **Gensim**\n  https://radimrehurek.com/gensim\n- **Flair**\n  https://github.com/flairNLP/flair\n- **SpaCy**\n  https://spacy.io\n- **Textacy**\n  https://textacy.readthedocs.io\n- **Stanza**\n  https://stanfordnlp.github.io/stanza\n\n##### R\n\n- **Text Mining with R - an open access book**\n  https://www.tidytextmining.com\n\n- **Text mining in R for the social sciences and digital humanities**\n  \u003chttps://tm4ss.github.io/docs\u003e\n\n- **IRaMuTeq**: R interface for Multidimensional Text and Questionnaire Analysis (French)\n  http://www.iramuteq.org/\n\n- **quanteda**\n  https://quanteda.io/\n  - Automatisierte Inhaltsanalyse mit R (German, using quanteda)\n      http://inhaltsanalyse-mit-r.de\n- **TM**\n  https://cran.r-project.org/web/packages/tm/\n  - Using the TM package\n      https://rpubs.com/tsholliger/301914\n  - A Tutorial of Text Mining in R Using TM Package\n      https://medium.com/text-mining-in-data-science-a-tutorial-of-text/text-mining-in-data-science-51299e4e594\n\n- **fulltext**: integration of rOpenSci R packages to create a single interface to many bibliographic data sources\n  https://github.com/ropensci/fulltext\n\n#### Javascript\n\n- https://github.com/winkjs/wink-nlp\n\n\n### Text annotation for machine learning\n- TEI Publisher (with Docker image)\n  https://github.com/eeditiones/tei-publisher-app\n\n- TextAnnotator\n  http://www.textannotator.texttechnologylab.org/\n\n- Annotate them All: community annotation of scientific texts to Wikidata items\n  - Info \u003chttps://sprint.elifesciences.org/annotate-them-all/\u003e\n  - GitHub: https://github.com/lubianat/ann\n\n- inception: A semantic annotation platform offering intelligent assistance and knowledge management\n  https://inception-project.github.io/\n- doccano: Text Annotation for Humans (only text data)\n  \u003chttps://doccano.github.io/doccano\u003e\n\n- LabelStudio: Open Source Data Labeling Tool\n  \u003chttps://labelstud.io\u003e\n\n- CATMA\n  \u003chttps://catma.de/\u003e\n\n- CitExt (citation annotation based on AnyStyle):\n  https://github.com/cboulanger/citext\n\n**Commercial:**\n\n- Prodigy: an annotation tool powered by active learning (commmercial)\n  \u003chttps://prodi.gy\u003e\n\n- tagtog: The Text Annotation Tool to Train AI (commercial)\n  \u003chttps://tagtog.net\u003e\n\n- LightTag Text Annotation Tool (commercial)\n  \u003chttps://www.lighttag.io/\u003e\n  - Free Academic Tier\n      \u003chttps://www.lighttag.io/signup/academic/\u003e\n\n### Possibly relevant corpora\n\n- F. Vogel, H. Hamann et al., JuReKo - Juristisches Referenzkorpus.\n  https://www.cal2.eu/core-projects-and-associated-projects/jureko-juristisches-referenzkorpus\n\n- GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing\n  https://isg.beel.org/blog/2019/12/10/giant-the-1-billion-annotated-synthetic-bibliographic-reference-string-dataset-for-deep-citation-parsing-pre-print/\n\n\n## 5. Bibliometrics / Citation Analysis\n\n### Initiatives\n\n- **WikiCite**\n  https://meta.wikimedia.org/wiki/WikiCite\n\n- **OpenCitations**\n  https://opencitations.net/\n\n- **Kompetenzzentrum Bibliometrie**: A German initiative to boost bilbliometric research\n  https://www.bibliometrie.info/\n\n\n### Research graphs web services\n\n- **Scholia**: knowledge graph based on WikiData\n  https://scholia.toolforge.org/\n\n- **OpenAIRE Research Graph**\n  https://graph.openaire.eu/\n\n- **Inciteful**: provides a Paper Discovery and a Literature Connector tool\n  https://help.inciteful.xyz/\n\n- **Connected Papers**: relies on semanticscholar.org to visualize a citation graph\n  \u003chttp://connectedpapers.com\u003e\n\n- **LitMaps**: research discovery\n  https://www.litmaps.com/\n\n- **ResearchRabbit**\n  https://researchrabbitapp.com/\n\n### Scholarship\n\n- [Bibliography (Zotero)](https://www.zotero.org/groups/4370759/computational_sociolegal_and_historical_legal_studies/collections/JTUVEEAP)\n\n\n### Automatic article corpus aggregation\n\n- Open Access PDF harvester and ingester\n  https://github.com/kermitt2/article-dataset-builder\n\n### Metadata extraction/annotation software and services\n\n- Evaluation of Open-Source Bibliographic Reference and Citation Parsers\n  https://arxiv.org/ftp/arxiv/papers/1802/1802.01168.pdf\n\n- Scholarcy Reference Extraction API\n  \u003chttps://ref.scholarcy.com/api/\u003e\n\n#### PDF citation extraction\n\n- GROBID, a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured XML/TEI encoded documents\n  \u003chttps://github.com/kermitt2/grobid\u003e\n  - NodeJS client (there are also Python \u0026 Java clients)\n      https://github.com/kermitt2/grobid-client-node\n\n- Anystyle (Ruby)\n  \u003chttps://anystyle.io\u003e\n  \u003chttps://github.com/inukshuk/anystyle\u003e\n\n- refext: extract reference strings from research papers in the PDF format (Java, based on CERMINE)\n  \u003chttps://github.com/mkrnr/refext\u003e\n\n- refcat: large-scale citation graph generation tools\n  \u003chttps://gitlab.com/internetarchive/refcat\u003e\n\n- Content ExtRactor and MINEr (extracts metadata and content from PDF files containing academic publications)\n  \u003chttps://github.com/CeON/CERMINE\u003e\n\n- EXparser: a tool for extracting and segmenting reference strings from PDF documents\n  https://exparser.readthedocs.io/en/latest/\n\n- Science Parse parses scientific papers (in PDF form) and returns them in structured form (Java)\n  https://github.com/allenai/science-parse\n\n\n### Data exchange formats\n\n- CSL-JSON\n  https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html\n  https://aurimasv.github.io/z2csl/typeMap.xml\n\n- RIS\n  https://en.wikipedia.org/wiki/RIS_(file_format)\n\n- BibTeX\n  https://www.bibtex.com/g/bibtex-format/\n\n- Web of Science Export Format (the only format that has explicit support for citation data)\n  https://images.webofknowledge.com/images/help/WOS/hs_wos_fieldtags.html\n\n- Zotero Data Schema\n  https://github.com/zotero/zotero-schema\n\n### Matching/normalization of citations\n\n- fuzzycat: bibliographic fuzzy matching for fatcat.wiki\n  \u003chttps://gitlab.com/internetarchive/fuzzycat/\u003e\n\n- biblio-glutton: Framework dedicated to bibliographic information\n  \u003chttps://github.com/kermitt2/biblio-glutton\u003e\n\n- List of Title Word Abbreviations based on the ISO 4 system for the abbreviation of serial titles\n  https://www.issn.org/services/online-services/access-to-the-ltwa/\n\n### Applications for data analysis\n\n- SciMAT (Science Mapping Anaylsis Tool): tool for performing science mapping analyses under a longitudinal framework\n  https://sci2s.ugr.es/scimat/\n\n- CiteSpace: software for visualizing and analyzing trends and patterns in scientific literature\n  http://cluster.cis.drexel.edu/~cchen/citespace/\n\n- CitNetExplorer\n  \u003chttps://www.citnetexplorer.nl/\u003e\n\n- VOSViewer\n  \u003chttps://www.vosviewer.com/\u003e ([Keynote presentation](https://www.youtube.com/watch?v=3aSKhFeXIU4), [Tutorial Slides](https://de.slideshare.net/NeesJanvanEck/issi2015-tutorial-vosviewerandcitnetexplorer))\n\n- BiblioTools/BiblioMaps: create maps of science based on bibliographic data\n  http://www.sebastian-grauwin.com/bibliomaps/\n\n- Headstart: web-based knowledge mapping software proividing visualization and connectors to a number of academic search engines through rOpenSci, including BASE, PubMed, PLOS and DOAJ\n  https://github.com/OpenKnowledgeMaps/Headstart\n\n- Cited Reference Explorer\n  \u003chttps://andreas-thor.github.io/cre/\u003e\n\n- BibExplorer: Process curricula, extract article meta-data, and calculate bibliometric indicators\n  https://github.com/alandefreitas/bibexplorer\n\n- PubTrends: a scientific literature exploratory tool for analyzing topics of a research field and similar papers analysis (uses PubMed, Semantic Scholar)\n  https://github.com/JetBrains-Research/pubtrends\n\n- VIVO: member-supported, open source software and an ontology for representing scholarship \u003chttps://duraspace.org/vivo/about/\u003e\n\n- Publish or Perish. A free software program that retrieves and analyzes academic citations from a variety of data sources (incl. Google Scholar and Microsoft Academic Search)\n  https://harzing.com/resources/publish-or-perish\n\n- Sci2: The Science of Science (Sci2) Tool is a modular toolset specifically designed for the study of science\n  https://github.com/CIShell/sci2\n\n### Multi-purpose libraries\n\n#### Python\n\n- metaknowledge: for computational research in bibliometrics, scientometrics, and network analysis\n  https://metaknowledge.readthedocs.io\n\n- Tethne: integrated bibliographic and corpus analysis (Python **2.7**)\n  http://diging.github.io/tethne/\n  https://pythonhosted.org/tethne/index.html\n\n- étudier: drive a non-headless browser to collect a citation graph from google scholar around a particular citation or set of search results.\n  \u003chttps://github.com/edsu/etudier\u003e\n\n- Deep Reference Parsing: A deep learning architecture for reference mining from literature in the arts and humanities -\u003e parses individual references into components\n  - \u003chttps://github.com/dhlab-epfl/LinkedBooksDeepReferenceParsing\u003e\n  - [article (2018)](https://doi.org/10.3389/frma.2018.00021)\n\n- BiblioPy: co-citation analysis (Python 2)\n  https://github.com/Greenwicher/BiblioPy\n\n#### Working with Web of Science data\n- https://github.com/rafguns/wosfile\n- https://pypi.org/project/WOSplus/\n- https://pypi.org/project/wostools/\n\n#### R\n\n- bibliometrix: R library for comprehensive science mapping analysis (works with data extracted from the four main bibliographic databases: *SCOPUS*, *Web of Science*, *Digital Science Dimensions*, *The Lens*, *Cochrane Database of Systematic Reviews (CDSR)*, and *RISmed PubMed/MedLine*)\n  - \u003chttps://bibliometrix.org/\u003e\n  - See papers where this has been used: \u003chttps://bibliometrix.org/Papers.html\u003e\n  - biblioshiny: The shiny interface for bibliometrix\n      https://www.bibliometrix.org/home/index.php/layout/biblioshiny\n\n- biblionetwork: creates bibliographic coupling and cocitation networks\n  https://agoutsmedt.github.io/biblionetwork/\n\n- scimeetr: Analyse data from WoS/Scopus\n  https://github.com/MaximeRivest/scimeetr\n\n- Connect RStudio to Zotero\n  https://github.com/paleolimbot/rbbt\n  https://rstudio.github.io/visual-markdown-editing/#/citations\n\n- Querying CrossRef Data with R\n  https://poldham.github.io/abs/crossref.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcboulanger%2Fltkg-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcboulanger%2Fltkg-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcboulanger%2Fltkg-tools/lists"}