Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ecastra/comment-extractor

Powerful comment extraction
https://github.com/ecastra/comment-extractor

babel comments javascript typescript

Last synced: 25 days ago
JSON representation

Powerful comment extraction

Awesome Lists containing this project

README

        

## A Deep Dive into JavaScript/TypeScript Comment Extraction: Handling Complex Scenarios

This guide delves into the sophisticated workings of a robust JavaScript and TypeScript comment extractor, highlighting its ability to handle various comment types, nested structures, and tricky edge cases. This extractor goes beyond basic comment parsing, seamlessly navigating complex code constructs like class declarations, generators, JSX elements, and even comments within regular expressions.

**Key Features:**

* **Supports Various Comment Types:** The extractor handles single-line comments (`//`), multi-line comments (`/* */`), HTML comments (``), and JSDoc comments (`/** */`).
* **Handles Nested Comments:** Accurately recognizes nested multiline comments, including those within other comments. For instance, it correctly identifies the boundaries of `/* /* inner comment */ outer comment */`.
* **Manages Special Constructs:** Safely skips over comments within strings, template literals, regular expressions, JSX expressions, and other constructs.
* **Recognizes Comments Near Special Characters:** Handles comments adjacent to punctuation characters and symbols, including those in template literals.
* **Efficient Performance:** Utilizes bitmasks, lookup tables, and optimized character type checks for fast comment extraction.
* **Robust Error Handling:** Includes safeguards to handle potential errors during comment parsing and provides informative feedback.
* **Handles Comments in Class Declarations:** Accurately identifies and extracts comments within class declarations, even those interspersed with semicolons.
* **Handles Comments in Class Generators:** Accurately identifies and extracts comments within class generators, ensuring correct parsing of the `*` symbol.
* **Handles Comments in JSX Fragments:** Extracts comments that appear before and after opening and closing JSX fragment brackets.
* **Supports Comments Inside Regular Expressions:** Extracts comments that appear inside regular expressions.
* **User-Configurable Comment Types:** Allows the user to specify which comment types to extract using the `commentTypes` option.

**The `nested` Property:**

The comment extractor includes a `nested` property in its `Comment` object to indicate whether a comment is nested within another comment. This property is crucial for accurately extracting nested comments and for providing contextual information about comment placement.

**Example:**

```javascript
/* This is the outer comment.
* This is the inner comment.
*/
```

The extractor will return two separate `Comment` objects:

* **Outer comment:** `nested: false`
* **Inner comment:** `nested: true`

**Examples:**

**Example 1: Nested Comments**

```javascript
/* This is the outer comment.
* This is the inner comment.
*/
```

**Example 2: Comments with different types and sequences:**

```javascript
// This is a single-line comment at the beginning.
/* This is a multiline comment
spanning multiple lines.
It contains a nested single-line comment: // This is a nested comment
and another nested multiline comment: /* Nested multiline comment */
*/

/** This is a JSDoc comment with a nested single-line comment: // This is a nested comment
* It spans multiple lines.
*/

class MyClass {
// This is a comment inside the class.
;;;/*++*/;;; // This is another comment with semicolons.
method() {
// This is a comment inside a method.
}

*generatorMethod() {
// This is a comment inside the generator method.
yield 1;
// This is another comment inside the generator method.
yield 2;
}

// This is another comment inside the class.
}

X =
/* arrow */
> y;
// This is a comment after a punctuator.

/* This is a multiline comment
spanning multiple lines.
It contains a nested single-line comment: // This is a nested comment
and another nested multiline comment: /* Nested multiline comment */
*/
// This is a single-line comment at the end.
```

**Example 3: Comments in JSX Fragments:**
```javascript
// This is a comment before the JSX fragment
<>
{/* This is a comment inside the JSX fragment. */}

Hello!

{/* This is another comment inside the JSX fragment. */}
>;
// This is a comment after the JSX fragment.
```

**Example 4: Comments inside Regular Expressions:**
```javascript
const regex = /This is a regular expression with a comment /* inside *//; // This is a comment after the regex.
```

**Using the Comment Extractor in a Printer:**

```typescript
// ... (Assuming you have a code printer function) ...

function printNode(node, commentPlacement = 'after') {
let output = "";

// ... existing logic to print the node ...

// Collect comments associated with the node
const comments = collectComments(source, node.getStart(), node.getEnd());

// Add comments based on placement and type
if (commentPlacement === 'before') {
comments.forEach((comment) => {
if (comment.type === 'html') {
output += `\n`;
} else if (comment.type === 'jsdoc') {
output += `\n/** ${source.substring(comment.start, comment.end)} */`;
} else { // Single-line or multiline comments
output += `\n// ${comment.type === 'multiline' ? '/*' : ''}${source.substring(comment.start, comment.end)}${comment.type === 'multiline' ? '*/' : ''}`;
}
});
output += `\n${node.getText()}`; // Print the node after comments
} else { // commentPlacement === 'after' (default)
output += `\n${node.getText()}`; // Print the node first
comments.forEach((comment) => {
if (comment.type === 'html') {
output += `\n`;
} else if (comment.type === 'jsdoc') {
output += `\n/** ${source.substring(comment.start, comment.end)} */`;
} else { // Single-line or multiline comments
output += `\n// ${comment.type === 'multiline' ? '/*' : ''}${source.substring(comment.start, comment.end)}${comment.type === 'multiline' ? '*/' : ''}`;
}
});
}

return output;
}

// Example usage
const output = printNode(myNode, 'before');
console.log(output);
```