https://github.com/spenserblack/gengo
A linguist-inspired language classifier with multiple file source handlers
https://github.com/spenserblack/gengo
language-statistics rust
Last synced: 2 months ago
JSON representation
A linguist-inspired language classifier with multiple file source handlers
- Host: GitHub
- URL: https://github.com/spenserblack/gengo
- Owner: spenserblack
- License: apache-2.0
- Created: 2023-01-22T16:11:26.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-31T11:44:30.000Z (3 months ago)
- Last Synced: 2025-04-04T13:12:14.320Z (2 months ago)
- Topics: language-statistics, rust
- Language: Rust
- Homepage:
- Size: 1.09 MB
- Stars: 27
- Watchers: 2
- Forks: 12
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE-APACHE
- Codeowners: CODEOWNERS
Awesome Lists containing this project
README
# gengo (言語)
[](https://crates.io/crates/gengo)
[](https://crates.io/crates/gengo-bin)
[](https://github.com/spenserblack/gengo/actions/workflows/ci.yml)
[](https://codecov.io/gh/spenserblack/gengo)A [linguist][linguist]-inspired language classifier with multiple file source handlers
## Comparison
| Feature/Behavior | [linguist][linguist] | gengo |
| :---------------------------------: | :------------------: | :--------: |
| **Analyze Git Revision** | Yes | Yes |
| **Analyze Directory** | No | Yes |
| **Requires Git Repository** | Yes | No |
| **Detect Language by Extension** | Yes | Yes |
| **Detect Language by Filename** | Yes | Yes |
| **Detect by Filepath Pattern** | No | Yes |
| **Detect Language with Heuristics** | Yes | Yes |
| **Detect Language with Classifier** | Yes | Not Yet ;) |## Installation
[](https://repology.org/project/rust%3Agengo/versions)
View [the installation documentation][install-docs].
## Usage
This tool has multiple file sources. Each file source can have unique usage to take advantage of its
strengths and work around its weaknesses.### Directory File Source
This is a very generic file source that tries not to make many assumptions about your environment
and workspace.#### Ignoring Files
You can utilize a `.gitignore` file and/or an `.ignore` file to prevent files from
being scanned. See the [`ignore`][ignore-crate] for more details.### Git File Source
The git file source is highly opinionated -- it tries to act like a git utility, and uses git tools.
Its goal is to behave similarly to [linguist]. This means that this file source does *not* need any
actual files present, and can work on a bare repository, making it suitable for usage with a Git
server.#### Overrides
Like [linguist][linguist], you can override behavior using a `.gitattributes` file.
Basically, just replace `linguist-FOO` with `gengo-FOO`. _Unlike_ linguist,
`gengo-detectable` will _always_ make a file be included in statistics (linguist
will still exclude them if they're generated or vendored).```gitattributes
# .gitattributes# boolean attributes:
# These can be *negated* by prefixing with `-` (`-gengo-documentation`).
# Mark a file as documentation
*.html gengo-documentation
# Mark a file as generated
my-built-files/* gengo-generated
# Mark a file as vendored
deps/* gengo-vendored# string attributes:
# Override the detected language for a file
# Use the Language enum's variant name (see docs.rs for more details)
templates/*.js gengo-language=PlainText
```You will need to commit your `.gitattributes` file for it to take effect.
[ignore-crate]: https://docs.rs/ignore
[install-docs]: ./docs/INSTALLATION.md
[linguist]: https://github.com/github-linguist/linguist