{"id":13482591,"url":"https://github.com/dlwh/epic","last_synced_at":"2025-12-30T17:26:04.827Z","repository":{"id":57740175,"uuid":"1639100","full_name":"dlwh/epic","owner":"dlwh","description":"**Archived** Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.","archived":true,"fork":false,"pushed_at":"2020-02-19T05:14:32.000Z","size":23078,"stargazers_count":469,"open_issues_count":30,"forks_count":82,"subscribers_count":43,"default_branch":"master","last_synced_at":"2024-08-01T17:32:36.878Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://scalanlp.org/","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dlwh.png","metadata":{"files":{"readme":"README-NEURAL.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-04-20T06:00:04.000Z","updated_at":"2024-04-12T15:28:57.000Z","dependencies_parsed_at":"2022-09-01T12:31:24.697Z","dependency_job_id":null,"html_url":"https://github.com/dlwh/epic","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlwh%2Fepic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlwh%2Fepic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlwh%2Fepic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlwh%2Fepic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dlwh","download_url":"https://codeload.github.com/dlwh/epic/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222262400,"owners_count":16957575,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T17:01:03.576Z","updated_at":"2025-12-17T13:07:06.615Z","avatar_url":"https://github.com/dlwh.png","language":"Scala","funding_links":[],"categories":["人工智能","Packages","函式庫"],"sub_categories":["Spring Cloud框架","Libraries","書籍"],"readme":"The neural CRF parser is a high-performing constituency parser.\n\n\n\n## Preamble\n\nThe neural CRF parser is described in:\n\n\"Neural CRF Parsing\" Greg Durrett and Dan Klein. ACL 2015.\n\nIt is an extension of the span parser described in\n\n\"Less Grammar, More Features\" David Hall, Greg Durrett, and Dan Klein. ACL 2014.\n\nand is based on the Epic parsing framework. See https://github.com/dlwh/epic\nfor more documentation about the span parser and the Epic framework.\nSee http://www.eecs.berkeley.edu/~gdurrett/ for papers and BibTeX.\n\nQuestions? Bugs? Email me at gdurrett@eecs.berkeley.edu\n\n\n\n## Setup\n\nYou need three things to run the neural CRF parser:\n\n1) The compiled .jar; run ```sbt assembly``` to produce this\n\n2) A treebank: the Penn Treebank or one of the SPMRL treebanks\n\n3) Some sort of word vectors. These can either be in the .bin format\nof Mikolov et al. (2013) or the .txt format of Bansal et al. (ACL 2014).  For\nEnglish, the best performance comes from using Bansal et al.'s vectors:\n\nhttp://ttic.uchicago.edu/~mbansal/codedata/dependencyEmbeddings-skipdep.zip\n\nFor other languages, you can train suitable vectors on monolingual data using\n```word2vec``` with the following arguments:\n\n    -cbow 0 -size 100 -window 1 -sample 1e-4 -threads 8 -binary 0 -iter 15\n\nThese are mildly tuned, and using a small window size is important, but other\nsettings are likely to work well too.\n\n\n\n\n## Usage\n\nTo run the parser on new text (tokenized, one-sentence-per-line), use the following command:\n\n    java -Xmx4g -cp path/to/assembly.jar epic.parser.ParseText --model neuralcrf.parser \\\n      --tokenizer whitespace --sentences newline --nthreads 8 [files]\n\nYou can download the ```neuralcrf.parser``` model from:\n\nhttp://nlp.cs.berkeley.edu/projects/neuralcrf.shtml\n\n(As of March 1, 2017, this model does not work with the latest version of epic. If you want\nto use the pre-trained model, use the commit with hash\n\n8968e0966da28101744ce6f5bbb0de4345d9c594\n\nfrom March 30, 2016.)\n\nTo train a new parser as described in the neural CRF paper, run the following command\n(note that you need to fill in paths for -cp, --treebank.path, and --word2vecPath):\n\n    java -Xmx47g -cp path/to/assembly.jar epic.parser.models.NeuralParserTrainer \\\n      --cache.path constraints.cache \\\n      --opt.useStochastic \\\n      --treebank.path path/to/wsj/ \\\n      --evalOnTest \\\n      --includeDevInTrain \\\n      --trainer.modelFactory.annotator epic.trees.annotations.PipelineAnnotator \\\n      --ann.0 epic.trees.annotations.FilterAnnotations  \\\n      --ann.1 epic.trees.annotations.ForgetHeadTag \\\n      --ann.2 epic.trees.annotations.Markovize \\\n      --ann.2.horizontal 0 \\\n      --ann.2.vertical 0 \\\n      --modelFactory epic.parser.models.PositionalNeuralModelFactory \\\n      --opt.batchSize 200 \\\n      --word2vecPath path/to/skipdep_embeddings.txt \\\n      --threads 8\n\nTo run on SPMRL treebanks, modify the arguments to the command above as follows:\n\n1) Add the following arguments (replace ${LANG}$ as appropriate):\n\n    --treebankType spmrl \\\n    --binarization head \\\n    --supervisedHeadFinderPtbPath path/to/gold/ptb/train/train.${LANG}.gold.ptb \\\n    --supervisedHeadFinderConllPath path/to/gold/conll/train/train.${LANG}.gold.conll \\\n    --ann.3 epic.trees.annotations.SplitPunct\n\n2) Modify --treebank.path to point to the X_SPMRL/gold/ptb directory.\n\nOptions to configure the neural network and training are largely defined in ```epic.parser.models.PositionalNeuralModelFactory```\n\n### Miscellaneous Notes\n\nTo run on the development set, simply remove ```evalOnTest``` and\n```includeDevInTrain``` from the arguments.\n\nYou should use the official version of ```evalb``` on the output files (gold\nand guess) rather than relying on the native scorer in the Epic parser. For\nSPMRL, you should use the version distributed with the shared task.\n\nNote that the X-bar grammar and coarse pruning masks (constraints) are cached\nbetween runs in the same directory, which speeds up training and testing time\nconsiderably as generating the masks is time-consuming.\n\nFinally, note that multiple parsers cannot be trained simultaneously in\nthe same directory, since certain files (such as pruning masks from the\ncoarse model) will collide.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdlwh%2Fepic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdlwh%2Fepic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdlwh%2Fepic/lists"}