{"id":13571475,"url":"https://github.com/antlr/codebuff","last_synced_at":"2025-04-05T23:11:36.565Z","repository":{"id":66220454,"uuid":"49157731","full_name":"antlr/codebuff","owner":"antlr","description":"Language-agnostic pretty-printing through machine learning (uh, like, is this possible? YES, apparently).","archived":false,"fork":false,"pushed_at":"2020-08-06T16:37:22.000Z","size":6227,"stargazers_count":451,"open_issues_count":3,"forks_count":83,"subscribers_count":27,"default_branch":"master","last_synced_at":"2025-03-29T22:08:51.020Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antlr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-01-06T19:49:54.000Z","updated_at":"2025-03-29T08:02:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"100f78f4-f3c9-40b4-9bb7-35c1953fbda3","html_url":"https://github.com/antlr/codebuff","commit_stats":null,"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antlr%2Fcodebuff","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antlr%2Fcodebuff/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antlr%2Fcodebuff/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antlr%2Fcodebuff/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antlr","download_url":"https://codeload.github.com/antlr/codebuff/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247411239,"owners_count":20934653,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T14:01:02.332Z","updated_at":"2025-04-05T23:11:36.543Z","avatar_url":"https://github.com/antlr.png","language":"Java","funding_links":[],"categories":["Java","SQL"],"sub_categories":["Formatters"],"readme":"# CodeBuff smart formatter\n\nBy Terence Parr (primary developer), Fangzhou (Morgan) Zhang (help with initial development), Jurgen Vinju (co-author of academic paper, help with empirical results and algorithm discussions).\n\n[kaby76](https://github.com/kaby76) has done a [C# port](https://github.com/kaby76/cs-codebuff).\n\n## Abstract\n\nCode formatting is not particularly exciting but many researchers would consider it either unsolved or not well-solved.  The two well-established solutions are:\n\n1.  Build a custom program that formats code for specific a language with ad hoc techniques, typically subject to parameters such as \"*always put a space between operators*\".\n2.  Define a set of formal rules that map input patterns to layout instructions such as \"*line these expressions up vertically*\".\n\nEither techniques are painful and finicky.  \n\nThis repository is a step towards what we hope will be a universal code formatter that uses machine learning to look for patterns in a corpus and to format code using those patterns.  \n\nIt requires Java 8. See `pom.xml` for dependencies (e.g., ANTLR 4.x, ...).\n\n*Whoa!* It appears to work.  Academic paper, [Towards a Universal Code Formatter through Machine Learning](http://arxiv.org/abs/1606.08866) accepted to SLE2016.  Sample output is in the paper or next section. Video from [Terence's presentation](https://www.youtube.com/watch?v=Mni2HVGGUdo).\n\n## Sample output\n\nAll input is completed squeezed of whitespace/newlines so only the output really matters when examining CodeBuff output. You can check out the [output](https://github.com/antlr/codebuff/tree/master/output) dir for leave-one-out formatting of the various [corpora](https://github.com/antlr/codebuff/tree/master/corpus). But, here are some sample formatting results.\n\n### SQL\n\n```sql\nSELECT *\nFROM DMartLogging\nWHERE DATEPART(day, ErrorDateTime) = DATEPART(day, GetDate())\n      AND DATEPART(month, ErrorDateTime) = DATEPART(month, GetDate())\n      AND DATEPART(year, ErrorDateTime) = DATEPART(year, GetDate())\nORDER BY ErrorDateTime\n    DESC\n```\n\n```sql\nSELECT\n    CASE WHEN SSISInstanceID IS NULL\n        THEN 'Total'\n    ELSE SSISInstanceID END SSISInstanceID\n    , SUM(OldStatus4) AS OldStatus4\n    , SUM(Status0) AS Status0\n    , SUM(Status1) AS Status1\n    , SUM(Status2) AS Status2\n    , SUM(Status3) AS Status3\n    , SUM(Status4) AS Status4\n    , SUM(OldStatus4 + Status0 + Status1 + Status2 + Status3 + Status4) AS InstanceTotal\nFROM\n    (\n        SELECT\n            CONVERT(VARCHAR, SSISInstanceID)             AS SSISInstanceID\n            , COUNT(CASE WHEN Status = 4 AND\n                              CONVERT(DATE, LoadReportDBEndDate) \u003c\n                              CONVERT(DATE, GETDATE())\n                        THEN Status\n                    ELSE NULL END)             AS OldStatus4\n            , COUNT(CASE WHEN Status = 0\n                        THEN Status\n                    ELSE NULL END)             AS Status0\n            , COUNT(CASE WHEN Status = 1\n                        THEN Status\n                    ELSE NULL END)             AS Status1\n            , COUNT(CASE WHEN Status = 2\n                        THEN Status\n                    ELSE NULL END)             AS Status2\n            , COUNT(CASE WHEN Status = 3\n                        THEN Status\n                    ELSE NULL END)             AS Status3\n--, COUNT ( CASE WHEN Status = 4 THEN Status ELSE NULL END ) AS Status4\n            , COUNT(CASE WHEN Status = 4 AND\n                              DATEPART(DAY, LoadReportDBEndDate) = DATEPART(DAY, GETDATE())\n                        THEN Status\n                    ELSE NULL END)             AS Status4\n        FROM dbo.ClientConnection\n        GROUP BY SSISInstanceID\n    ) AS StatusMatrix\nGROUP BY SSISInstanceID\n```\n\n### Java\n\n```java\npublic class Interpreter {\n    ...\n    public static final Set\u003cString\u003e predefinedAnonSubtemplateAttributes = new HashSet\u003cString\u003e() {\n                                                                              {\n                                                                                  add(\"i\");\n                                                                                  add(\"i0\");\n                                                                              }\n                                                                          };\n...\n    public int exec(STWriter out, InstanceScope scope) {\n        final ST self = scope.st;\n        if ( trace ) System.out.println(\"exec(\"+self.getName()+\")\");\n        try {\n            setDefaultArguments(out, scope);\n            return _exec(out, scope);\n        }\n        catch (Exception e) {\n            StringWriter sw = new StringWriter();\n            PrintWriter pw = new PrintWriter(sw);\n            e.printStackTrace(pw);\n            pw.flush();\n            errMgr.runTimeError(this,\n                                scope,\n                                ErrorType.INTERNAL_ERROR,\n                                \"internal error: \"+sw.toString());\n            return 0;\n        }\n    }\n...\n    protected int _exec(STWriter out, InstanceScope scope) {\n        final ST self = scope.st;\n        int start = out.index(); // track char we're about to write\n        int prevOpcode = 0;\n        int n = 0; // how many char we write out\n        int nargs;\n        int nameIndex;\n        int addr;\n        String name;\n        Object o, left, right;\n        ST st;\n        Object[] options;\n        byte[] code = self.impl.instrs;        // which code block are we executing\n        int ip = 0;\n        while ( ip\u003cself.impl.codeSize ) {\n            if ( trace|| debug ) trace(scope, ip);\n            short opcode = code[ip];\n            //count[opcode]++;\n            scope.ip = ip;\n            ip++; //jump to next instruction or first byte of operand\n            switch ( opcode ) {\n                case Bytecode.INSTR_LOAD_STR:\n                    // just testing...\n                    load_str(self, ip);\n                    ip += Bytecode.OPND_SIZE_IN_BYTES;\n                    break;\n                case Bytecode.INSTR_LOAD_ATTR:\n                    nameIndex = getShort(code, ip);\n                    ip += Bytecode.OPND_SIZE_IN_BYTES;\n                    name = self.impl.strings[nameIndex];\n                    try {\n                        o = getAttribute(scope, name);\n                        if ( o== ST.EMPTY_ATTR ) o = null;\n                        }\n                    catch (STNoSuchAttributeException nsae) {\n                        errMgr.runTimeError(this, scope, ErrorType.NO_SUCH_ATTRIBUTE, name);\n                        o = null;\n                    }\n                    operands[++sp] = o;\n                    break;\n...\n```\n\n### ANTLR\n\n```\nreferenceType : classOrInterfaceType | typeVariable | arrayType ;\n\nclassOrInterfaceType\n    :   (   classType_lfno_classOrInterfaceType\n        |   interfaceType_lfno_classOrInterfaceType\n        )\n        (   classType_lf_classOrInterfaceType\n        |   interfaceType_lf_classOrInterfaceType\n        )*\n    ;\n```\n\n```\nclassModifier\n    :   annotation\n    |   'public'\n    |   'protected'\n    |   'private'\n    |   'abstract'\n    |   'static'\n    |   'final'\n    |   'strictfp'\n    ;\n```\n\n```\ntypeSpecifier\n    :   (   'void'\n        |   'char'\n        |   'short'\n        |   'int'\n        |   'long'\n        |   'float'\n        |   'double'\n        |   'signed'\n        |   'unsigned'\n        |   '_Bool'\n        |   '_Complex'\n        |   '__m128'\n        |   '__m128d'\n        |   '__m128i'\n        )\n    |   '__extension__' '(' ('__m128' | '__m128d' | '__m128i') ')'\n    |   atomicTypeSpecifier\n    |   structOrUnionSpecifier\n    |   enumSpecifier\n    |   typedefName\n    |   '__typeof__' '(' constantExpression ')' // GCC extension\n    ;\n```\n\n## Build complete jar\n\nTo make a complete jar with all of the dependencies, do this from the repo main directory:\n\n```bash\n$ mvn clean compile install\n```\n\nThis will leave you with artifact `target/codebuff-1.4.19.jar` or whatever the version number is and put the jar into the usual maven local cache.\n\n## Formatting files\n\nTo use the formatter, you need to use class `org.antlr.codebuff.Tool`.  Commandline usage:\n\n* `-g` *grammar-name*. The grammar must be run through ANTLR and be compiled (and in the `CLASSPATH`). For example, for `Java8.g4`, use `-g Java8`, not the filename. For separated grammar files, like `ANTLRv4Parser.g4` and `ANTLRv4Lexer.g4`, use `-g ANTLRv4`. If the grammar is in a package, use fully-qualified like `-g org.antlr.codebuff.ANTLRv4`.\n* `-rule` *start-rule*. Start rule of the grammar where parsing of a full file starts, such as `compilationUnit` in `Java.g4`.\n* `-corpus` *root-dir-of-samples*\n* [`-files` *file-extension]*. E.g., use `java`, `g4`, `c`, ...\n* [`-indent` *num-spaces]*.  This defaults to 4 spaces indentation.\n* [`-comment` *line-comment-name*]. As a failsafe, CodeBuff allows you to specify the token name for single-line comments, such as `LINE_COMMENT`, within the grammar so that it can ensure there is a line break after a single line,.\n* [`-o` *output-file*]. Filename with optional path to where output should go.\n* *file-to-format*. Filename (with optional path) must be last.\n\nOutput goes to standard out unless you use `-o`.\n \n```bash\n$ java -jar target/codebuff-1.4.19.jar  \\\n       -g org.antlr.codebuff.ANTLRv4 \\\n       -rule grammarSpec \\\n       -corpus corpus/antlr4/training \\\n       -files g4 \\\n       -indent 4 \\\n       -comment LINE_COMMENT \\\n       T.g4\n```\n\n```bash\n$ java -jar target/codebuff-1.4.19.jar \\\n       -g org.antlr.codebuff.Java \\\n       -rule compilationUnit \\\n       -corpus corpus/java/training/stringtemplate4 \\\n       -files java \\\n       -comment LINE_COMMENT \\\n       T.java\n```\n\nThese examples work for the grammars specified because they are already inside the complete jar. For parsers compiled outside of the jar, you might need to do something like:\n\n```bash\njava java -cp target/codebuff-1.4.19.jar:$CLASSPATH \\\n       org.antlr.codebuff.Tool  \\\n       -g org.antlr.codebuff.ANTLRv4 \\\n       -rule grammarSpec -corpus corpus/antlr4/training \\\n       -files g4 -indent 4 -comment LINE_COMMENT T.g4\n```\n\n### Grammar requirements\n\nAll whitespace should go to the parser on a hidden channel. For example, here is a rule that does that:\n\n```\nWS  :\t[ \\t\\r\\n\\f]+ -\u003e channel(HIDDEN)\t;\n```\n\nComments should also:\n\n```\nBLOCK_COMMENT\n\t:\t'/*' .*? ('*/' | EOF)  -\u003e channel(HIDDEN)\n\t;\n\nLINE_COMMENT\n\t:\t'//' ~[\\r\\n]*  -\u003e channel(HIDDEN)\n\t;\n```\n\nYou can have line comments match newlines if you want.\n\n## Speed tests\n\nThe paper cites some speed tests for training and formatting time for\n\n* [guava corpus](https://github.com/antlr/codebuff/tree/master/corpus/java/training/guava) and [java grammar](https://github.com/antlr/codebuff/blob/master/grammars/org/antlr/codebuff/Java.g4)\n* [guava corpus](https://github.com/antlr/codebuff/tree/master/corpus/java/training/guava) and [java8 grammar](https://github.com/antlr/codebuff/blob/master/grammars/org/antlr/codebuff/Java8.g4)\n* [antlr corpus](https://github.com/antlr/codebuff/tree/master/corpus/antlr4/training) and [antlr parser grammar](https://github.com/antlr/codebuff/blob/master/grammars/org/antlr/codebuff/ANTLRv4Parser.g4), [antlr lexer grammar](https://github.com/antlr/codebuff/blob/master/grammars/org/antlr/codebuff/ANTLRv4Lexer.g4)\n\nFirst, here is my machine configuration:\n\n\u003cimg src=images/imac.png width=250\u003e\n\nMemory speed seems to make a big difference given how much we have to trawl through memory---The tests shown below were done with 1867 MHz DDR3 RAM.  We set an initial 4G RAM, 1M stack size.  First build everything:\n\n```bash\n$ mvn clean compile install\n```\n\nThen you can run the speed tests as shown in following subsections.\n\n#### ANTLR corpus\n\n```bash\n$ java -Xmx4G -Xss1M -cp target/codebuff-1.4.19.jar org.antlr.codebuff.validation.Speed -antlr corpus/antlr4/training/Java8.g4\nLoaded 12 files in 172ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 353ms formatting = 340ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 188ms formatting = 161ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 145ms formatting = 153ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 130ms formatting = 129ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 123ms formatting = 113ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 114ms formatting = 116ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 93ms formatting = 90ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 80ms formatting = 90ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 73ms formatting = 88ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 72ms formatting = 71ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 71ms formatting = 69ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 71ms formatting = 73ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 76ms formatting = 63ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 70ms formatting = 70ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 70ms formatting = 69ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 73ms formatting = 70ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 70ms formatting = 68ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 71ms formatting = 66ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 70ms formatting = 70ms\nantlr training of /Users/parrt/antlr/code/codebuff/corpus/antlr4/training/Java8.g4 = 73ms formatting = 72ms\nmedian of [5:19] training 72ms\nmedian of [5:19] formatting 70ms\n```\n\n#### Guava corpus, Java grammar\n\n```bash\n$ java -Xms4G -Xss1M -cp target/codebuff-1.4.19.jar org.antlr.codebuff.validation.Speed -java_guava corpus/java/training/guava/cache/LocalCache.java\nLoaded 511 files in 1949ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1984ms formatting = 2669ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1747ms formatting = 3166ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1784ms formatting = 2811ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1507ms formatting = 1742ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1499ms formatting = 2832ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1582ms formatting = 2663ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1499ms formatting = 2807ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1561ms formatting = 2815ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1521ms formatting = 2136ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1545ms formatting = 2811ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1501ms formatting = 2800ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1506ms formatting = 2581ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1494ms formatting = 2838ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1494ms formatting = 2789ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1497ms formatting = 2621ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1501ms formatting = 2714ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1506ms formatting = 2816ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1512ms formatting = 2733ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1515ms formatting = 2587ms\njava_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1508ms formatting = 2430ms\nmedian of [5:19] training 1506ms\nmedian of [5:19] formatting 2733ms\n```\n\n#### Guava corpus, Java8 grammar\n\nLoad time here is very slow (2.5min) because the Java8 grammar is meant to reflect the language spec. It has not been optimized for performance. Once the corpus is loaded, training and formatting times are about the same as for Java grammar.\n\n```bash\n$ java -Xms4G -Xss1M -cp target/codebuff-1.4.19.jar \\\n       org.antlr.codebuff.validation.Speed \\\n       -java8_guava corpus/java/training/guava/cache/LocalCache.java\nLoaded 511 files in 159947ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 2238ms formatting = 23312ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1913ms formatting = 2368ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1855ms formatting = 2277ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1856ms formatting = 2267ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1868ms formatting = 2348ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1890ms formatting = 2263ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1866ms formatting = 2328ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1855ms formatting = 2247ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1856ms formatting = 2243ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1871ms formatting = 2204ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1863ms formatting = 2244ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1850ms formatting = 2212ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1861ms formatting = 2215ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1877ms formatting = 2257ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1843ms formatting = 2249ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1842ms formatting = 2205ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1869ms formatting = 2343ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1864ms formatting = 2225ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1851ms formatting = 2260ms\njava8_guava training of /Users/parrt/antlr/code/codebuff/corpus/java/training/guava/cache/LocalCache.java = 1871ms formatting = 2200ms\nmedian of [5:19] training 1863ms\nmedian of [5:19] formatting 2244ms\n```\n\n## Generating graphs from paper\n\nIn the *Towards a Universal Code Formatter Through Machine Learning* paper, we have three graphs to support our conclusions. This sections shows how to reproduce them. (Note that these jobs take many minutes to run; maybe up to 30 minutes for one of them on a fast box.)\n\nThe Java code generates python code that uses matplotlib. The result of running the python is a PDF of the graph (that also pops up in a window).\n\n### Box plot with median error rates\n\nTo generate:\n\n\u003cimg src=\"images/leave_one_out.png\" width=\"400\"\u003e\n\ndo this:\n\n```bash\n$ mvn clean compile install\n$ java -Xms8G -Xss1M -cp target/codebuff-1.4.19.jar org.antlr.codebuff.validation.LeaveOneOutValidator\n...\nwrote python code to python/src/leave_one_out.py\n$ cd python/src\n$ python leave_one_out.py \u0026\n```\n\n### Plot showing effect of corpus size on error rate\n\nTo generate:\n\n\u003cimg src=\"images/subset_validator.png\" width=\"400\"\u003e\n\ndo this:\n\n```bash\n$ mvn clean compile install\n$ java -Xms8G -Xss1M -cp target/codebuff-1.4.19.jar org.antlr.codebuff.validation.SubsetValidator\n...\nwrote python code to python/src/subset_validator.py\n$ cd python/src\n$ python subset_validator.py \u0026\n```\n\n### Plot showing effect of varying model parameter k\n\nTo generate:\n\n\u003cimg src=\"images/vary_k.png\" width=\"400\"\u003e\n\ndo this:\n\n```bash\n$ mvn clean compile install\n$ java -Xms8G -Xss1M -cp target/codebuff-1.4.19.jar org.antlr.codebuff.validation.TestK\n...\nwrote python code to python/src/vary_k.py\n$ cd python/src\n$ python vary_k.py \u0026\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantlr%2Fcodebuff","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantlr%2Fcodebuff","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantlr%2Fcodebuff/lists"}