{"id":43308405,"url":"https://github.com/friteuseb/nlp_tools","last_synced_at":"2026-03-04T12:13:03.684Z","repository":{"id":279383504,"uuid":"938152505","full_name":"friteuseb/nlp_tools","owner":"friteuseb","description":"usefull extension, add nlp methods and text analysis ","archived":false,"fork":false,"pushed_at":"2025-12-24T18:25:16.000Z","size":53,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-26T07:50:10.123Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/friteuseb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-02-24T14:01:20.000Z","updated_at":"2025-12-25T10:58:24.000Z","dependencies_parsed_at":"2025-02-25T09:39:31.773Z","dependency_job_id":"5dc747d5-2ec0-40e1-b596-9082e11afaed","html_url":"https://github.com/friteuseb/nlp_tools","commit_stats":null,"previous_names":["friteuseb/nlp_tools"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/friteuseb/nlp_tools","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friteuseb%2Fnlp_tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friteuseb%2Fnlp_tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friteuseb%2Fnlp_tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friteuseb%2Fnlp_tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/friteuseb","download_url":"https://codeload.github.com/friteuseb/nlp_tools/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friteuseb%2Fnlp_tools/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29381022,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-12T20:34:40.886Z","status":"ssl_error","status_checked_at":"2026-02-12T20:23:00.490Z","response_time":55,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-01T21:00:22.048Z","updated_at":"2026-02-12T21:00:34.135Z","avatar_url":"https://github.com/friteuseb.png","language":"PHP","funding_links":[],"categories":["Natural Language Processing"],"sub_categories":["Recommended core stack"],"readme":"# NLP Tools for TYPO3\n\n[![TYPO3 12](https://img.shields.io/badge/TYPO3-12-orange.svg)](https://get.typo3.org/version/12)\n[![TYPO3 13](https://img.shields.io/badge/TYPO3-13-orange.svg)](https://get.typo3.org/version/13)\n[![TYPO3 14](https://img.shields.io/badge/TYPO3-14-orange.svg)](https://get.typo3.org/version/14)\n[![License](https://img.shields.io/packagist/l/cywolf/nlp-tools.svg)](https://packagist.org/packages/cywolf/nlp-tools)\n\nA comprehensive TYPO3 extension for Natural Language Processing, compatible with TYPO3 v12, v13 and v14.\n\n## Installation\n\n```bash\ncomposer require cywolf/nlp-tools\n```\n\nActivate the extension in the TYPO3 extension manager.\n\n## Available Features\n\n### 1. Stop Words Management\n\nFilter stop words in different languages (FR, EN, DE, ES).\n\n```php\nuse Cywolf\\NlpTools\\Service\\StopWordsFactory;\n\nclass YourClass {\n    protected StopWordsFactory $stopWordsFactory;\n\n    public function __construct(StopWordsFactory $stopWordsFactory) \n    {\n        $this-\u003estopWordsFactory = $stopWordsFactory;\n    }\n\n    public function stopWordsExample(): void \n    {\n        // Get stop words for a language\n        $frenchStopWords = $this-\u003estopWordsFactory-\u003egetStopWords('fr');\n\n        // Check if a word is a stop word\n        if ($frenchStopWords-\u003eisStopWord('le')) {\n            // It's a stop word\n        }\n\n        // Get the complete list of stop words\n        $allStopWords = $frenchStopWords-\u003egetStopWords();\n    }\n}\n```\n\n### 2. Language Detection\n\nAutomatic language detection service based on n-grams.\n\n```php\nuse Cywolf\\NlpTools\\Service\\LanguageDetectionService;\n\nclass YourClass {\n    protected LanguageDetectionService $languageDetector;\n\n    public function __construct(LanguageDetectionService $languageDetector) \n    {\n        $this-\u003elanguageDetector = $languageDetector;\n    }\n\n    public function detectionExample(): string \n    {\n        $text = \"This is an example of English text\";\n        return $this-\u003elanguageDetector-\u003edetectLanguage($text); // Returns 'en'\n    }\n}\n```\n\n### 3. Text Analysis\n\nComplete text analysis service including tokenization, stemming, and removal of stop words.\n\n```php\nuse Cywolf\\NlpTools\\Service\\TextAnalysisService;\n\nclass YourClass {\n    protected TextAnalysisService $textAnalyzer;\n\n    public function __construct(TextAnalysisService $textAnalyzer) \n    {\n        $this-\u003etextAnalyzer = $textAnalyzer;\n    }\n\n    public function analysisExample(): array \n    {\n        $text = \"Here is an example text to analyze\";\n\n        // Tokenization\n        $tokens = $this-\u003etextAnalyzer-\u003etokenize($text);\n\n        // Stemming\n        $stemmed = $this-\u003etextAnalyzer-\u003estem($text, 'en');\n\n        // Remove stop words\n        $withoutStopWords = $this-\u003etextAnalyzer-\u003eremoveStopWords($text, 'en');\n\n        return [\n            'tokens' =\u003e $tokens,\n            'stemmed' =\u003e $stemmed,\n            'cleaned' =\u003e $withoutStopWords\n        ];\n    }\n}\n```\n\n### 4. Text Vectorization\n\nService for converting text into vector representations for machine learning.\n\n```php\nuse Cywolf\\NlpTools\\Service\\TextVectorizerService;\n\nclass YourClass {\n    protected TextVectorizerService $vectorizer;\n\n    public function __construct(TextVectorizerService $vectorizer) \n    {\n        $this-\u003evectorizer = $vectorizer;\n    }\n\n    public function vectorizationExample(): array \n    {\n        $texts = [\n            \"This is the first document to analyze\",\n            \"A second document with different content\",\n            \"And finally a third example\"\n        ];\n\n        // Create TF-IDF vectors\n        $tfIdfData = $this-\u003evectorizer-\u003ecreateTfIdfVectors($texts, 'en');\n        \n        // Create document-term matrix\n        $dtmData = $this-\u003evectorizer-\u003ecreateDocumentTermMatrix($texts, 'en');\n        \n        // Calculate similarity between two vectors\n        $similarity = $this-\u003evectorizer-\u003ecosineSimilarity(\n            $tfIdfData['vectors'][0],\n            $tfIdfData['vectors'][1]\n        );\n        \n        // Calculate similarity matrix\n        $similarityMatrix = $this-\u003evectorizer-\u003ecalculateSimilarityMatrix($tfIdfData['vectors']);\n        \n        return [\n            'tfidf' =\u003e $tfIdfData,\n            'dtm' =\u003e $dtmData,\n            'similarity' =\u003e $similarity,\n            'matrix' =\u003e $similarityMatrix\n        ];\n    }\n}\n```\n\n### 5. Text Clustering\n\nService for automatically grouping similar texts together.\n\n```php\nuse Cywolf\\NlpTools\\Service\\TextClusteringService;\n\nclass YourClass {\n    protected TextClusteringService $clustering;\n\n    public function __construct(TextClusteringService $clustering) \n    {\n        $this-\u003eclustering = $clustering;\n    }\n\n    public function clusteringExample(): array \n    {\n        $texts = [\n            \"The cat sleeps on the couch\", \n            \"My dog plays in the garden\",\n            \"I like cats and domestic felines\",\n            \"The dog is man's best friend\",\n            \"Pets bring joy\"\n        ];\n\n        // K-means clustering (k=2 groups)\n        $kMeansClusters = $this-\u003eclustering-\u003ekMeansClustering($texts, 2, 'en');\n        \n        // Hierarchical clustering\n        $hierarchicalClusters = $this-\u003eclustering-\u003ehierarchicalClustering(\n            $texts, \n            0.6, // Distance threshold\n            'en'\n        );\n        \n        // Similarity-based clustering\n        $similarityClusters = $this-\u003eclustering-\u003esimilarityBasedClustering(\n            $texts,\n            0.7, // Similarity threshold\n            'en'\n        );\n        \n        return [\n            'kmeans' =\u003e $kMeansClusters,\n            'hierarchical' =\u003e $hierarchicalClusters,\n            'similarity' =\u003e $similarityClusters\n        ];\n    }\n}\n```\n\n### 6. Topic Modeling\n\nService for extracting themes and topics from text collections.\n\n```php\nuse Cywolf\\NlpTools\\Service\\TopicModelingService;\n\nclass YourClass {\n    protected TopicModelingService $topicModeling;\n\n    public function __construct(TopicModelingService $topicModeling) \n    {\n        $this-\u003etopicModeling = $topicModeling;\n    }\n\n    public function topicsExample(): array \n    {\n        $texts = [\n            \"The new economic policy favors local businesses\",\n            \"The government announces an economic recovery plan\",\n            \"Researchers have discovered a new medical treatment\",\n            \"A scientific study reveals the impact of climate on health\",\n            \"The stock market saw a strong rise following economic announcements\"\n        ];\n\n        // Extract topics\n        $topics = $this-\u003etopicModeling-\u003eextractTopics(\n            $texts,\n            2, // Number of topics to extract\n            5  // Number of terms per topic\n        );\n        \n        // Extract representative terms from a group of texts\n        $terms = $this-\u003etopicModeling-\u003eextractTopicTerms(\n            $texts,\n            10 // Number of terms to extract\n        );\n        \n        // Extract key phrases from a text\n        $keyPhrases = $this-\u003etopicModeling-\u003eextractKeyPhrases(\n            $texts[0],\n            3 // Number of phrases to extract\n        );\n        \n        return [\n            'topics' =\u003e $topics,\n            'terms' =\u003e $terms,\n            'key_phrases' =\u003e $keyPhrases\n        ];\n    }\n}\n```\n\n## Example of use in a TYPO3 extension\n\n### Services.yaml configuration\n\n```yaml\nservices:\n  _defaults:\n    autowire: true\n    autoconfigure: true\n    public: false\n\n  YourVendor\\YourExtension\\:\n    resource: '../Classes/*'\n\n  YourVendor\\YourExtension\\Service\\TextProcessingService:\n    public: true\n```\n\n### Service class\n\n```php\nnamespace YourVendor\\YourExtension\\Service;\n\nuse Cywolf\\NlpTools\\Service\\TextAnalysisService;\nuse Cywolf\\NlpTools\\Service\\LanguageDetectionService;\nuse Cywolf\\NlpTools\\Service\\TextClusteringService;\nuse Cywolf\\NlpTools\\Service\\TopicModelingService;\n\nclass TextProcessingService \n{\n    protected TextAnalysisService $textAnalyzer;\n    protected LanguageDetectionService $languageDetector;\n    protected TextClusteringService $clustering;\n    protected TopicModelingService $topicModeling;\n\n    public function __construct(\n        TextAnalysisService $textAnalyzer,\n        LanguageDetectionService $languageDetector,\n        TextClusteringService $clustering,\n        TopicModelingService $topicModeling\n    ) {\n        $this-\u003etextAnalyzer = $textAnalyzer;\n        $this-\u003elanguageDetector = $languageDetector;\n        $this-\u003eclustering = $clustering;\n        $this-\u003etopicModeling = $topicModeling;\n    }\n\n    public function processText(string $text): array \n    {\n        // Language detection\n        $language = $this-\u003elanguageDetector-\u003edetectLanguage($text);\n\n        // Complete analysis\n        return [\n            'language' =\u003e $language,\n            'tokens' =\u003e $this-\u003etextAnalyzer-\u003etokenize($text),\n            'stemmed' =\u003e $this-\u003etextAnalyzer-\u003estem($text, $language),\n            'without_stopwords' =\u003e $this-\u003etextAnalyzer-\u003eremoveStopWords($text, $language),\n            'key_phrases' =\u003e $this-\u003etopicModeling-\u003eextractKeyPhrases($text, 3, $language)\n        ];\n    }\n    \n    public function analyzeMultipleTexts(array $texts): array\n    {\n        // Clustering and topic analysis\n        $clusters = $this-\u003eclustering-\u003ekMeansClustering($texts, 3);\n        $topics = $this-\u003etopicModeling-\u003eextractTopics($texts, 3);\n        \n        return [\n            'clusters' =\u003e $clusters,\n            'topics' =\u003e $topics\n        ];\n    }\n}\n```\n\n## Using with cache\n\nTo improve performance, you can inject a TYPO3 cache into the services:\n\n```php\nuse TYPO3\\CMS\\Core\\Cache\\CacheManager;\nuse Cywolf\\NlpTools\\Service\\TextAnalysisService;\n\nclass YourController\n{\n    protected TextAnalysisService $textAnalyzer;\n    protected CacheManager $cacheManager;\n    \n    public function __construct(\n        TextAnalysisService $textAnalyzer,\n        CacheManager $cacheManager\n    ) {\n        $this-\u003etextAnalyzer = $textAnalyzer;\n        $this-\u003ecacheManager = $cacheManager;\n    }\n    \n    public function yourAction(): void\n    {\n        // Get the cache\n        $cache = $this-\u003ecacheManager-\u003egetCache('nlp_tools');\n        \n        // Pass it to a service for faster calculations\n        $this-\u003etextAnalyzer-\u003esetCache($cache);\n        \n        // Use the service normally\n        $tokens = $this-\u003etextAnalyzer-\u003etokenize($text);\n    }\n}\n```\n\n## TYPO3 Compatibility\n\nThis extension is compatible with:\n- TYPO3 v12.4+\n- TYPO3 v13.0+\n- TYPO3 v14.0+\n\n**PHP Requirements:** PHP 8.1 or higher\n\n## Important Notes\n\n- Language detection uses TYPO3 language configuration if available\n- Stemming uses a simplified internal implementation, with fallback to the Snowball library\n- Services can be injected via TYPO3's dependency injection\n- Clustering algorithms are optimized for acceptable performance even on large text collections\n- Use caching to improve performance on repetitive operations","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffriteuseb%2Fnlp_tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffriteuseb%2Fnlp_tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffriteuseb%2Fnlp_tools/lists"}