{"id":17154234,"url":"https://github.com/gurleensethi/codeinfoextractor","last_synced_at":"2026-04-16T00:33:02.556Z","repository":{"id":98501464,"uuid":"300789947","full_name":"gurleensethi/CodeInfoExtractor","owner":"gurleensethi","description":"An application for parsing out comment information from source code.","archived":false,"fork":false,"pushed_at":"2020-10-20T21:36:16.000Z","size":123,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-11T04:28:21.622Z","etag":null,"topics":["gradle","java","parser"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gurleensethi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-03T03:47:52.000Z","updated_at":"2020-10-10T02:41:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"a0a336e5-506a-4f0c-bab9-6aef9b16d25c","html_url":"https://github.com/gurleensethi/CodeInfoExtractor","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/gurleensethi/CodeInfoExtractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurleensethi%2FCodeInfoExtractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurleensethi%2FCodeInfoExtractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurleensethi%2FCodeInfoExtractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurleensethi%2FCodeInfoExtractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gurleensethi","download_url":"https://codeload.github.com/gurleensethi/CodeInfoExtractor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurleensethi%2FCodeInfoExtractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31866347,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"ssl_error","status_checked_at":"2026-04-15T15:24:39.138Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gradle","java","parser"],"created_at":"2024-10-14T21:48:40.073Z","updated_at":"2026-04-16T00:33:02.536Z","avatar_url":"https://github.com/gurleensethi.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CodeInfoExtractor\n\nAn application for parsing out comment information from source code.\n\n## Running the project\n\n### From the command line using gradle\n\nMake sure you have gradle installed on your local machine.\n\n```cmd\n./gradlew run --args \"\u003cfile_path\u003e\"\n```\n\nExample usage: \n\n#### Single File\n\n```cmd\n./gradlew run --args samplefiles/SampleFile.java\n```\n\n#### Multiple Files\n\n```cmd\n./gradlew run --args \"samplefiles/SampleFile.java samplefiles/sample-file.ts samplefiles/sample_file.py\"\n```\n\n### Using `jar` file\n\n#### Prebuilt `jar` file\n\nDownload the jar file from the latest [release](https://github.com/gurleensethi/CodeInfoExtractor/releases).\n\n```cmd\njava -jar codeinfoextractor.jar samplefiles/SampleFile.java\n```\n\n#### Manually building `jar` file\n\nBuild the jar.\n\n```cmd\n./gradle jar\n```\n\nRun using built jar (The built jar file can be found in `build/libs/`).\n\n```cmd\njava -jar build/libs/codeinfoextractor.jar samplefiles/SampleFile.java\n```\n\n## Design Overview\n\nThe core consists of 3 main parts, `InfoExtractor`, `FileLoader` and `ILanguageParser`. Below is a diagramtic representation of the design.\n\n![](https://raw.githubusercontent.com/gurleensethi/CodeInfoExtractor/master/images/design-diagram.png)\n\n- **ILanguageParser**: Contract that every parser should implement. It contains one method named `parse(String data)` that takes in data as `String` read from the passed file. A parser should implement the parsing logic in this method and return `LanguageParserResult`.\n\n- **FileLoader**: Helps in loading code files and filtering out ones that are inappropriate (such as files with no extensions).\n\n- **InfoExtractor**: Main class to be used for parsing the code data. With the concept of factory method pattern it allows seamless registration of new parsers.\n \n#### `InfoExtractor` usage\n\n```java\nInfoExtractor infoExtractor = new InfoExtractor();\n\ninfoExtractor.registerParser(\"java\", JavaParser::new);\ninfoExtractor.registerParser(\"ts\", TypescriptParser::new);\ninfoExtractor.registerParser(\"py\", PythonParser::new);\n\nfinal List\u003cLanguageParseResult\u003e results = infoExtractor.parseFiles(sourceCodeFileList);\n```\n\n## Adding New Parsers\n\nTo add a new parser:\n\n- Implement the `ILanguageParser` interface and add the desired logic in `parse` method.\n\n```java\npublic class MyParser implements ILanguageParser {\n    @Override\n    public LanguageParseResult parse(String data) {\n        final LanguageParseResult result = new LanguageParseResult();\n        \n        // My parsing logic.\n        \n        return result;\n    }\n}\n```\n\n- Register the parser when using `InfoExtractor` along with the extension of file for which the parser should be used.\n\n```java\nInfoExtractor infoExtractor = new InfoExtractor();\ninfoExtractor.registerParser(\"myextension\", MyParser::new);\n```\n\n## Prewritten parsers\n\nThe project already contains parsers for `Java`, `Python` and `Typescript`. These can be found in the `codeinfoextractor.parsers` package.\n\nAll of these parsers use `regex` to find comments in source code.\n\nBelow are some assumptions made when writing these parsers:\n\n- Java/Typescript: \n  - Single line comments start with `//`.\n  - Block comments start with `/*` and end when the closest `*/` is found.\n- Python:\n  - Single line comments start with `#` and don't have any other line before or after starting with `#`.\n  - Block comments start each line with `#`. There has to be 2 or more contiguous lines start with `#` to be considered as block comments.\n  \n#### False positives\n\nAlthough the regex used in the prewritten parsers can detect single-line and block comments, they can also provide false-positives in some cases.\n\nExample:\n\n```java\n// This is a comment\nSystem.out.println(\"// This is not a comment\");\n```\n\nThe java parser will detect two comments in this case.\n\nThese parsers can be improvd later on without affecting rest of the project since every parser is isolated.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgurleensethi%2Fcodeinfoextractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgurleensethi%2Fcodeinfoextractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgurleensethi%2Fcodeinfoextractor/lists"}