https://github.com/jwalsh/syntree-generator
A tool for converting French literary text into S-expression syntax trees for linguistic analysis, with visualization capabilities
https://github.com/jwalsh/syntree-generator
abstract-syntax-tree constituency-parsing emacs french linguistics literary-analysis nlp org-mode parser proust python s-expression spacy syntax-analysis syntax-tree
Last synced: 2 months ago
JSON representation
A tool for converting French literary text into S-expression syntax trees for linguistic analysis, with visualization capabilities
- Host: GitHub
- URL: https://github.com/jwalsh/syntree-generator
- Owner: jwalsh
- Created: 2025-03-04T21:06:12.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-03-04T22:10:22.000Z (2 months ago)
- Last Synced: 2025-03-04T22:23:13.495Z (2 months ago)
- Topics: abstract-syntax-tree, constituency-parsing, emacs, french, linguistics, literary-analysis, nlp, org-mode, parser, proust, python, s-expression, spacy, syntax-analysis, syntax-tree
- Language: Python
- Homepage: https://wal.sh
- Size: 129 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.org
Awesome Lists containing this project
README
#+TITLE: syntree-generator
#+AUTHOR: Jason Walsh
#+EMAIL: [email protected]* Syntree Generator
A tool for converting French literary text into S-expression syntax trees for linguistic analysis, with visualization capabilities.
[[./static/screenshots/syntax-tree-ui.png]]
** Overview
Syntree Generator converts natural language text into structured Abstract Syntax Trees (ASTs) represented in S-expression format. It's particularly optimized for analyzing French literary texts, such as Proust's works, by mapping syntactic dependencies to constituent structure.
The tool breaks down sentences into their grammatical components (nouns, verbs, phrases, clauses) and generates formal representations that can be visualized and studied. It includes a web UI for interactive exploration of the syntax trees and supports extracting samples for use with external visualization tools.
** Features
- Converts text to constituency-based syntax trees in S-expression format
- Optimized for French literary texts with special focus on complex syntactic structures
- Web-based visualization of syntax trees
- Integration with spaCy for linguistic analysis
- Sample extraction for use with external tools
- Command-line interface for batch processing
- Configurable chunking for processing large texts
- Emacs integration for syntax highlighting and advanced tree visualization** Installation
The project uses Poetry for dependency management.
#+BEGIN_SRC bash
# Clone the repository
git clone https://github.com/jwalsh/syntree-generator.git
cd syntree-generator# Install dependencies
make setup
#+END_SRC** Usage
*** Basic usage
To parse a text file and generate S-expressions:
#+BEGIN_SRC bash
# Using the shell script
./_.sh path/to/input.txt path/to/output.lisp# Or using make
make run INPUT_FILE=data/pg15288.txt OUTPUT_FILE=output/proust.lisp
#+END_SRC*** Getting samples
To generate a sample of S-expressions that can be easily loaded into the S-expression Grammar Analyzer:
#+BEGIN_SRC bash
make samples SAMPLE_SIZE=5
#+END_SRCThis will create a file with the extension ~.sample.lisp~ containing 5 sample S-expressions.
*** Visualization
To view the syntax trees in the web UI:
#+BEGIN_SRC bash
make serve
# Then visit http://localhost:8765 in your browser
#+END_SRC*** Advanced Emacs Integration
For users with Emacs, the project provides enhanced syntax highlighting and visualization:
#+BEGIN_SRC bash
# Copy the provided .emacs.d/init.el or load publish.el
emacs -l publish.el# Then open any .lisp file in the examples directory
# Use C-c t to visualize the tree structure
#+END_SRC** Literary Texts
The repository includes org-mode scripts to download various French literary texts for testing and analysis:
#+BEGIN_SRC bash
# Download all texts
make download-texts# Process specific texts
make run INPUT_FILE=data/pg2650.txt OUTPUT_FILE=output/swann.lisp
make run INPUT_FILE=data/pg6099.txt OUTPUT_FILE=output/baudelaire.lisp
#+END_SRC** Examples
The ~examples/~ directory contains S-expression examples at various complexity levels:
- Simple examples with basic subject-verb structures
- Medium examples with prepositional phrases and modifiers
- Complex examples with relative/subordinate clauses
- Very complex examples typical of Proust's style** Documentation
The documentation is available in the org-mode files and can be published to HTML:
#+BEGIN_SRC bash
# Generate all documentation
make docs# View the documentation
open docs/index.html
#+END_SRC** Development
*** Running Tests
#+BEGIN_SRC bash
make test
#+END_SRC*** Code Formatting
#+BEGIN_SRC bash
make format
#+END_SRC*** Capturing Screenshots
#+BEGIN_SRC bash
# Setup shot-scraper
./setup-shot-scraper.sh# Capture screenshots of the web UI
make screenshots
#+END_SRC** License
MIT License
Copyright (c) 2025 Jason Walsh
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.