Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/clariah/hhucap
Current historical studies of career mobility often focus on linkage of personal records such as baptism records. More qualitative sources, such as biographies contain vital information as well, but are labour intensive to process. We propose a combination of Robust Semantic Parsing and Linked Data conversion tools to automatically derive career patterns from 35,000 biographies in the Biography Portal in the period 1815-1940. Substantively, we answer the question what career patterns looked like and changed over the long Nineteenth century. Methodologically, we evaluate to what extent current CLARIAH tools are up to automate this process. We will progress the semantic parsing tools by improving the linguistic expression set related to HISCO, adding an OCR cleaning step to the pipeline and experimenting with alternative CLARIAH tools for Dutch. This will result in a detailed report on the performance of CLARIAH tools on this data.
https://github.com/clariah/hhucap
advertisements biographies biographynet career-mobility clariah newspaper nlp
Last synced: 10 days ago
JSON representation
Current historical studies of career mobility often focus on linkage of personal records such as baptism records. More qualitative sources, such as biographies contain vital information as well, but are labour intensive to process. We propose a combination of Robust Semantic Parsing and Linked Data conversion tools to automatically derive career patterns from 35,000 biographies in the Biography Portal in the period 1815-1940. Substantively, we answer the question what career patterns looked like and changed over the long Nineteenth century. Methodologically, we evaluate to what extent current CLARIAH tools are up to automate this process. We will progress the semantic parsing tools by improving the linguistic expression set related to HISCO, adding an OCR cleaning step to the pipeline and experimenting with alternative CLARIAH tools for Dutch. This will result in a detailed report on the performance of CLARIAH tools on this data.
- Host: GitHub
- URL: https://github.com/clariah/hhucap
- Owner: CLARIAH
- License: mit
- Created: 2017-03-01T07:45:13.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2020-03-20T14:47:52.000Z (almost 5 years ago)
- Last Synced: 2024-04-14T05:11:56.238Z (8 months ago)
- Topics: advertisements, biographies, biographynet, career-mobility, clariah, newspaper, nlp
- Language: TeX
- Size: 8.34 MB
- Stars: 1
- Watchers: 14
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# hhucap
Current historical studies of career mobility often focus on linkage of personal records such as baptism records. More qualitative sources, such as biographies contain vital information as well, but are labour intensive to process. We propose a combination of Robust Semantic Parsing and Linked Data conversion tools to automatically derive career patterns from 35,000 biographies in the Biography Portal in the period 1815-1940. Substantively, we answer the question what career patterns looked like and changed over the long Nineteenth century. Methodologically, we evaluate to what extent current CLARIAH tools are up to automate this process. We will progress the semantic parsing tools by improving the linguistic expression set related to HISCO, adding an OCR cleaning step to the pipeline and experimenting with alternative CLARIAH tools for Dutch. This will result in a detailed report on the performance of CLARIAH tools on this data.Update 2020-03-20:
The code for the simple tagger tool is available via: https://github.com/cltl/SimpleTagger