https://github.com/makarms/text-probe
Simple and extensible PHP library for text analysis and pattern matching, designed to help developers probe, parse, and manipulate text efficiently.
https://github.com/makarms/text-probe
addresses-parsing contributions-welcome datetime discord email good-first-contribution good-first-issue good-first-pr help-wanted mbstring open-source phone-number php regex regexp slack telegram text-analysis uuid
Last synced: 3 days ago
JSON representation
Simple and extensible PHP library for text analysis and pattern matching, designed to help developers probe, parse, and manipulate text efficiently.
- Host: GitHub
- URL: https://github.com/makarms/text-probe
- Owner: MakarMS
- License: mit
- Created: 2025-07-23T22:08:45.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-09-29T23:08:21.000Z (9 days ago)
- Last Synced: 2025-09-30T01:09:53.543Z (8 days ago)
- Topics: addresses-parsing, contributions-welcome, datetime, discord, email, good-first-contribution, good-first-issue, good-first-pr, help-wanted, mbstring, open-source, phone-number, php, regex, regexp, slack, telegram, text-analysis, uuid
- Language: PHP
- Homepage:
- Size: 95.7 KB
- Stars: 7
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
[](https://packagist.org/packages/makarms/text-probe) [](https://packagist.org/packages/makarms/text-probe) [](https://codecov.io/github/MakarMS/text-probe) [](https://packagist.org/packages/makarms/text-probe)
# TextProbe
**TextProbe** is a simple and extensible PHP library for text analysis and pattern matching. It is designed to help
developers probe, parse, and manipulate text efficiently using customizable rules and matchers.## Features
- π§ Easy-to-use API for text matching and parsing
- π§ Extensible architecture β write your own matchers and rules
- π‘ Suitable for parsing logs, user input, or any structured text## Installation
You can install the library via [Composer](https://getcomposer.org/):
```bash
composer require makarms/text-probe
```## Available Probes
The library comes with several built-in probes to detect common patterns in text:
### π§βπ» Contact & Identity
- `DiscordNewUsernameProbe` β extracts Discord usernames in the new format (e.g., `@username`), enforcing Discordβs
updated naming rules (length, characters, no consecutive dots).- `DiscordOldUsernameProbe` β extracts classic Discord usernames in the format `username#1234`, ensuring proper
structure and valid discriminator.- `EmailProbe` β extracts email addresses.
- `PhoneProbe` β extracts phone numbers (supports various formats).
- `SlackUsernameProbe` β extracts Slack usernames (e.g., @username), supporting Slack-specific username rules such as
allowed characters, length limits, and no consecutive dots.- `TelegramUserLinkProbe` β extracts t.me links pointing to Telegram users.
- `TelegramUsernameProbe` β extracts Telegram usernames (e.g., `@username`).
### π Date & Time
- `DateProbe` β extracts dates in various formats (e.g., YYYY-MM-DD, DD/MM/YYYY, 2nd Jan 2023).
- `DateTimeProbe` β extracts combined date and time in multiple common formats.
- `TimeProbe` β extracts times (e.g., 14:30, 14:30:15, optional AM/PM).
### π³ Finance
- `BankCardNumberProbe` β extracts bank card numbers in common formats: plain digits (e.g., 4111111111111111), digits
separated by spaces (e.g., 4111 1111 1111 1111) or dashes (e.g., 4111-1111-1111-1111). Only Luhn-valid numbers by
default.### πΊ Geolocation
- `GeoCoordinatesProbe` β extracts geographic coordinates in various formats (decimal or degrees/minutes/seconds,
N/S/E/W).### π· Social & Tags
- `HashtagProbe` β extracts hashtags from text (e.g., #example), supporting Unicode letters, numbers, and underscores,
detecting hashtags in any position of the text.### π UUID & Identifiers
- `UUIDProbe` β extracts any valid UUID (v1βv6) without checking the specific version. Supports standard UUID formats
with hyphens.- `UUIDv1Probe` β extracts UUID version 1, matching the format `xxxxxxxx-xxxx-1xxx-xxxx-xxxxxxxxxxxx`, commonly used for
time-based identifiers.- `UUIDv2Probe` β extracts UUID version 2, matching the format `xxxxxxxx-xxxx-2xxx-xxxx-xxxxxxxxxxxx`, typically used in
DCE Security contexts.- `UUIDv3Probe` β extracts UUID version 3, matching the format `xxxxxxxx-xxxx-3xxx-xxxx-xxxxxxxxxxxx`, generated using
MD5 hashing of names and namespaces.- `UUIDv4Probe` β extracts UUID version 4, matching the format `xxxxxxxx-xxxx-4xxx-xxxx-xxxxxxxxxxxx`, randomly
generated and commonly used for unique identifiers.- `UUIDv5Probe` β extracts UUID version 5, matching the format `xxxxxxxx-xxxx-5xxx-xxxx-xxxxxxxxxxxx`, generated using
SHA-1 hashing of names and namespaces.- `UUIDv6Probe` β extracts UUID version 6, matching the format `xxxxxxxx-xxxx-6xxx-xxxx-xxxxxxxxxxxx`, an ordered
version for better indexing and sorting.### π Web & Network
- `DomainProbe` β extracts domain names, including internationalized (Unicode) domains.
- `IPv4Probe` β extracts IPv4 addresses, supporting standard formats and excluding reserved/bogus ranges if necessary.
- `IPv6Probe` β extracts IPv6 addresses, including compressed formats, IPv4-mapped addresses, and zone indexes (e.g.,
%eth0).- `LinkProbe` β extracts hyperlinks, including ones with IP addresses, ports, or without a protocol.
- `MacAddressProbe` β extracts MAC addresses in standard formats using colons or hyphens (e.g., 00:1A:2B:3C:4D:5E or
00-1A-2B-3C-4D-5E), accurately detecting valid addresses while excluding invalid patterns.- `UserAgentProbe` β extracts User-Agent strings from text, supporting complex structures like multiple product tokens,
OS information, and browser identifiers.You can implement your own probes by creating classes that implement the `IProbe` interface.
Each probe also supports using a different validator for the returned values by passing an instance of a class
implementing the `IValidator` interface to the probeβs constructor. This allows you to override the default validation
logic.For example, `BankCardNumberProbe` uses a default validator based on the Luhn algorithm, but you can provide your
own validator if you want to enforce additional rules, such as limiting to specific card issuers or formats.## Usage Example
```php
require __DIR__ . '/vendor/autoload.php';use TextProbe\TextProbe;
use TextProbe\Probes\Contact\EmailProbe;$text = "Please contact us at info@example.com for more details.";
$probe = new TextProbe();
$probe->addProbe(new EmailProbe());$results = $probe->analyze($text);
foreach ($results as $result) {
echo sprintf(
"[%s] %s (position %d-%d)\n",
$result->getProbeType()->name,
$result->getResult(),
$result->getStart(),
$result->getEnd()
);
}
```### Expected output:
```
[EMAIL] info@example.com (position 21-37)
```