An open API service indexing awesome lists of open source software.

https://github.com/ewdlop/automatatheory


https://github.com/ewdlop/automatatheory

alan-turing church-encoding church-turning-thesis context-free-grammar factor-terms simplifed-expressions

Last synced: 4 months ago
JSON representation

Awesome Lists containing this project

README

        

# AutomataTheory

## Regular language

```markdown
If you need a **regex pattern that matches alphabets including accented characters**, you can use the following pattern:
```
```regex
[\p{L}\p{M}]+
```

### Explanation:
```markdown
- `\p{L}`: Matches any Unicode letter, including accented characters.
- `\p{M}`: Matches diacritical marks (for combining accents).
- `+`: Ensures it matches one or more characters.
```

### Example Matches:
```markdown
- **Café**
- **résumé**
- **naïve**
- **français**
- **Schrödinger**
```
```markdown
If your regex engine **does not support Unicode properties (`\p{L}` and `\p{M}`)**, you can use a broader character set:
```
```regex
[a-zA-ZÀ-ÖØ-öø-ÿ]+
```
```markdown
This explicitly includes:
- `À-Ö`: Uppercase accented letters
- `Ø-ö`: More accented characters
- `ø-ÿ`: Lowercase accented letters

Would you like this pattern for a specific programming language?
```

```markdown
To create regular expressions that match **Chinese** or **Japanese** characters, you can utilize Unicode script properties, provided your regex engine supports them.

**For Chinese Characters:**

- **Unicode Property Syntax:** Use `\p{Han}` to match any Han character, which includes Chinese ideographs.
```
```regex
\p{Han}+
```
```markdown
This pattern matches one or more consecutive Chinese characters.
```
**For Japanese Characters:**

```markdown
Japanese text comprises Hiragana, Katakana, and Kanji (which are also Han characters). To match these:

- **Hiragana:** Use `\p{Hiragana}`
- **Katakana:** Use `\p{Katakana}`
- **Kanji:** Use `\p{Han}`

To match any Japanese character, combine these properties:
```

```regex
[\p{Hiragana}\p{Katakana}\p{Han}]+
```

```markdown
This pattern matches one or more characters that are either Hiragana, Katakana, or Kanji.

**Important Considerations:**

- **Regex Engine Support:** Not all regex engines support Unicode property escapes. Engines like Perl, Java, and JavaScript (with the `/u` flag) do, but others might not. Always verify your specific environment's capabilities.

- **Alternative Approaches:** If your environment lacks support for Unicode properties, you can use explicit Unicode ranges. For example, to match Chinese characters:

```
```regex
[\u4E00-\u9FFF]+
```

```markdown
This matches characters in the CJK Unified Ideographs block, commonly used for Chinese.

For Japanese Hiragana and Katakana:
```
```regex
[\u3040-\u309F\u30A0-\u30FF]+
```
```markdown
This matches characters in the Hiragana and Katakana blocks.

**References:**

- [Regular Expressions for Japanese Text](https://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/)
- [Regular Expression to Find Chinese Characters](https://salesforce.stackexchange.com/questions/127565/regular-expression-to-find-chinese-characters)

These resources provide additional insights and examples for handling Chinese and Japanese text with regular expressions.
```