{"id":25215632,"url":"https://github.com/deezer/weslang","last_synced_at":"2025-06-17T04:33:02.202Z","repository":{"id":18499546,"uuid":"21695652","full_name":"deezer/weslang","owner":"deezer","description":"A language detection Web Service","archived":false,"fork":false,"pushed_at":"2017-05-09T11:19:52.000Z","size":69732,"stargazers_count":53,"open_issues_count":3,"forks_count":10,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-01T14:01:47.293Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deezer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-07-10T13:43:24.000Z","updated_at":"2025-01-27T22:25:42.000Z","dependencies_parsed_at":"2022-07-30T13:49:25.914Z","dependency_job_id":null,"html_url":"https://github.com/deezer/weslang","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/deezer/weslang","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fweslang","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fweslang/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fweslang/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fweslang/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deezer","download_url":"https://codeload.github.com/deezer/weslang/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fweslang/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260293519,"owners_count":22987591,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-10T18:15:12.520Z","updated_at":"2025-06-17T04:33:02.141Z","avatar_url":"https://github.com/deezer.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"Weslang\n=======\nWeslang: is a standalone WEb Service to detect the LANGuage of a given piece\nof text.\n\n[![Build Status](https://travis-ci.org/deezer/weslang.svg?branch=master)](https://travis-ci.org/deezer/weslang)\n\nIt works by executing both [CLD2](https://code.google.com/p/cld2/) and\n[Language-Detection](https://code.google.com/p/language-detection/).\n\nThe exposed API is very simple:\n\n```\nhost:8080/detect?q=\u003cTEXT\u003e\n```\n\nThe endpoint also supports `POST` requests in case longer payloads are required.\n\nThe response will be a JSON document like the following:\n\n```javascript\n{\n    \"language\": \"en\",\n    \"confidence\": 0.99\n}\n```\n\nWhere language is the ISO_639-1 code of the language (except for Chinese in\nwhich the locale is also returned, that is the result would be either zh-cn or\nzh-tw).\n\nAdditional endpoints for checking the health of the webservice will be exposed\nat localhost:9001.\n\nAmong the endopoints one can find\n\n* http://localhost:9001/health\n* http://localhost:9001/metrics\n* http://localhost:9001/env\n* http://localhost:9001/info\n* http://localhost:9001/mappings\n* http://localhost:9001/trace\n\nThis is done automatically by the Spring-Boot framework, via the\n[Actuator plugin](http://docs.spring.io/spring-boot/docs/current-SNAPSHOT/reference/htmlsingle/#production-ready).\n\nComponents\n==========\n\nThis project includes two components that could easily be a project on their\nown.\n\nJava Bindings for CLD2\n----------------------\nUsing [JNA](https://github.com/twall/jna) a Java interface is exposed for\ngetting the language via the [CLD2 library](https://code.google.com/p/cld2/).\n\nThe code lives in `//java/com/deezer/research/cld2`\n\nFork of Language-Detection\n--------------------------\nIn `//third-party/java/language-detection-v2` we have a fork of\n[language-detection](https://code.google.com/p/language-detection/).\n\nThe main changes we did, was to remove the randomization and some performance\nimprovements. See the file THIRD_PARTY.yaml in that folder for a comprehensive\nlist of changes.\n\nBuilding and Testing\n====================\n\nTo build and test this project [BUCK](https://facebook.github.io/buck/) is\nrequired and also Java 7. That means that it cannot be built under Windows.\n\n```bash\n$ buck test --all\n$ buck build //java/com/deezer/research/language:detection_app\n```\n\nRunning\n=======\n\nThe build command generated a file called\n`buck-out/gen/java/com/deezer/research/language/detection_app.jar`, which is a\nself contained binary.\n\nTo run it just execute:\n\n```bash\n$ java -jar detection_app.jar\n```\n\nIf for some reason you don't want or can't execute both detectors, you could\nrun:\n\n```bash\n$ java -jar detection_app.jar --spring.profiles.active=java_only\n$ java -jar detection_app.jar --spring.profiles.active=cld2\n$ java -jar detection_app.jar --spring.profiles.active=both\n```\n\nCurrently the Cld2 bindings are only generated for `linux-x86-64`, so if your\nmachine is different it probably won't work. In such a case, just execute it\nwith the `java_only` profile.\n\nLaunch service using docker\n--------------------------\nThe service can be also launched using docker. It requires the installation of `docker` and `docker-compose`. See [docker home page](https://docs.docker.com/) for installation references.\n\nOnce docker is installed execute\n```bash\n$ docker-compose up -d\n```\nthis will build the image of the service and launch it. To see on which port the service is exposed,\nexecute `docker-compose ps`. This will display a table similar to the example below:\n```bash\n$ docker-compose ps\n    Name                   Command               State                        Ports\n----------------------------------------------------------------------------------------------------\nweslang_api_1   java -jar buck-out/gen/jav ...   Up      0.0.0.0:32774-\u003e8080/tcp\n```\nin above example, the service on port `8080` is exposed on port `32774`.\n\nOn Mac OsX, docker runs on a VM. To know the actual ip, it can be executed the following command:\n```bash\n$ boot2docker ip\n192.168.59.103\n```\nin above example, the service can be queried by executing:\n```bash\n$ python -c \"import urllib;\\\nprint(urllib.urlopen('http://192.168.59.103:32774/detect?q=hello%20world').read());\" | \\\npython -m json.tool\n{\n    \"confidence\": 0.7706013715278043,\n    \"language\": \"en\"\n}\n```\n\nThe health endpoint running on port 9001 can be accessed by executing the following command:\n```bash\n$ docker exec CONTAINER_ID python -c \"import urllib;\\\nprint(urllib.urlopen('http://127.0.0.1:9001/health').read());\" | \\\npython -m json.tool\n{\n    \"status\": \"UP\"\n}\n```\nthe container id can be obtained by executing `docker ps`. For example:\n```bash\n$ docker ps\nCONTAINER ID        IMAGE                COMMAND                CREATED             STATUS ...\na88826098461        weslang_api:latest   \"java -jar buck-out/   21 hours ago        Up 21 hours\n```\nwhich outputs a CONTAINER_ID equals to `a88826098461`.\n\nCredits\n=======\n\nThis project is possible to several Open Source Projects\n\n* [Spring-Boot](http://projects.spring.io/spring-boot/): Java Framework.\n* [CLD2](https://code.google.com/p/cld2/): The language detector built into Chrome.\n* [Language-Detection](https://code.google.com/p/language-detection/): Language detector provided by Cybozu Labs.\n* [BUCK](https://facebook.github.io/buck/): build system released by Facebook.\n* [Guava](https://code.google.com/p/guava-libraries/): Additional Java libraries provided By Google.\n* [JNA](https://github.com/twall/jna): Libary to easily integrate C libraries with Java.\n\nLicense\n=======\nThis project is released under the Apache 2.0 License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fweslang","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeezer%2Fweslang","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fweslang/lists"}