{"id":22797702,"url":"https://github.com/agent-hellboy/codefile","last_synced_at":"2025-03-30T18:30:03.380Z","repository":{"id":266325535,"uuid":"898026842","full_name":"Agent-Hellboy/codefile","owner":"Agent-Hellboy","description":null,"archived":false,"fork":false,"pushed_at":"2024-12-12T16:05:02.000Z","size":17,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-05T21:01:14.556Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Agent-Hellboy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-03T16:53:06.000Z","updated_at":"2024-12-12T16:05:06.000Z","dependencies_parsed_at":"2024-12-03T18:28:10.500Z","dependency_job_id":"e5f4d71f-3701-4273-9796-817003a3c6ae","html_url":"https://github.com/Agent-Hellboy/codefile","commit_stats":null,"previous_names":["agent-hellboy/codefile"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agent-Hellboy%2Fcodefile","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agent-Hellboy%2Fcodefile/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agent-Hellboy%2Fcodefile/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agent-Hellboy%2Fcodefile/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Agent-Hellboy","download_url":"https://codeload.github.com/Agent-Hellboy/codefile/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246362994,"owners_count":20765208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-12T06:06:46.475Z","updated_at":"2025-03-30T18:30:03.353Z","avatar_url":"https://github.com/Agent-Hellboy.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# codefile\n\n`codefile` is a Go library for detecting the programming language of a given file. It uses content-based detection with weighted keyword matching, ensuring robust and accurate identification, even for files without extensions.\n\nIt uses [TOFU](#TOFU)\n\n---\n\n[![Go Reference](https://pkg.go.dev/badge/github.com/Agent-Hellboy/codefile.svg)](https://pkg.go.dev/github.com/Agent-Hellboy/codefile)\n[![Go Report Card](https://goreportcard.com/badge/github.com/Agent-Hellboy/codefile)](https://goreportcard.com/report/github.com/Agent-Hellboy/codefile)\n[![codecov](https://codecov.io/gh/Agent-Hellboy/codefile/branch/main/graph/badge.svg)](https://codecov.io/gh/Agent-Hellboy/codefile)\n\n## Features\n\n- **Content-Based Detection**: \n  - Detects programming languages by inspecting file content for unique constructs and patterns.\n- **Weighted Scoring**: \n  - Each language feature is assigned a weight to improve detection accuracy.\n- **Efficient Scanning**:\n  - Only inspects the first 20 lines of a file for optimal performance.\n\n---\n\n## Installation\n\nInstall the package using `go get`:\n\n```bash\ngo get github.com/Agent-Hellboy/codefile\n```\n\n## Usage\nBasic Language Detection\nDetect the programming language of a file:\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/Agent-Hellboy/codefile\"\n)\n\nfunc main() {\n\tfilePath := \"example.py\"\n\tlanguage, ok := codefile.DetectCodeFileType(filePath)\n\tif ok {\n\t\tfmt.Printf(\"The language of the file is: %s\\n\", language)\n\t} else {\n\t\tfmt.Println(\"Language could not be detected.\")\n\t}\n}\n\nThe language of the file is: Go\n```\n\n### Supported Languages\nThe library supports the following programming languages out of the box:\n\n- Python\n- Go\n- C++\n- Java\n- JavaScript\n- TypeScript\n- Shell\n\n### TOFU \n\n- Steps of TOFU Algorithm\n\n1 Tokenization:\n\nParse the file content and split it into meaningful tokens (e.g., keywords, operators, literals).\nConsider language-specific symbols like ;, {}, and ().\n\n2 Frequency Analysis:\n\nCount the occurrences of each token in the file.\nUse this frequency to weigh the probability of a match for each programming language.\n\n3 Weighted Matching:\n\nCompare the token distribution with predefined language profiles.\nEach profile contains common keywords, operators, and constructs with associated weights for a language.\n\n4 Confidence Scoring:\n\nCompute a confidence score for each language based on:\nToken frequency.\nUnique constructs (e.g., package main for Go, #include for C++).\nWeighted patterns.\n\n5 Threshold Comparison:\n\nIf the highest confidence score exceeds a predefined threshold, classify the file as that language.\nIf no score exceeds the threshold, classify the language as \"Unknown.\"\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagent-hellboy%2Fcodefile","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagent-hellboy%2Fcodefile","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagent-hellboy%2Fcodefile/lists"}