{"id":20053770,"url":"https://github.com/plevold/unicode-in-fortran","last_synced_at":"2026-03-19T13:18:41.944Z","repository":{"id":88354834,"uuid":"457907342","full_name":"plevold/unicode-in-fortran","owner":"plevold","description":null,"archived":false,"fork":false,"pushed_at":"2022-02-10T19:07:33.000Z","size":5,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-29T10:59:25.592Z","etag":null,"topics":["fortran","strings","unicode"],"latest_commit_sha":null,"homepage":"","language":"Fortran","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/plevold.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-02-10T18:51:11.000Z","updated_at":"2023-10-25T19:20:01.000Z","dependencies_parsed_at":"2023-09-25T01:43:50.856Z","dependency_job_id":null,"html_url":"https://github.com/plevold/unicode-in-fortran","commit_stats":{"total_commits":2,"total_committers":2,"mean_commits":1.0,"dds":0.5,"last_synced_commit":"1ab8fbb154bef172e947c9c8573f950359e2c308"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/plevold/unicode-in-fortran","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plevold%2Funicode-in-fortran","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plevold%2Funicode-in-fortran/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plevold%2Funicode-in-fortran/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plevold%2Funicode-in-fortran/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/plevold","download_url":"https://codeload.github.com/plevold/unicode-in-fortran/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plevold%2Funicode-in-fortran/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30297950,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T13:46:43.843Z","status":"ssl_error","status_checked_at":"2026-03-09T13:46:42.821Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fortran","strings","unicode"],"created_at":"2024-11-13T12:29:33.855Z","updated_at":"2026-03-09T14:07:21.796Z","avatar_url":"https://github.com/plevold.png","language":"Fortran","funding_links":[],"categories":[],"sub_categories":[],"readme":"// To get include::-s working on GitHub use asciidoctor-reducer on the file\n// ./doc/README.adoc in order to generate ./README.adoc\n= Unicode in Fortran\n\n== Introduction\n\nWARNING:: The following examples are based on my own experiences and testing.\n    I'm neither a Unicode expert nor a compiler maintainer.\n    If you find anything wrong with the examples please open an\n    https://github.com/plevold/unicode-in-fortran/issues[issue].\n\nUsing Unicode characters in you programs is not necessarily hard.\nThere is however very little information about Fortran and Unicode available.\nThis repository is a collection of examples and some explanations on how\nto use Unicode in Fortran.\n\nMost of what is written here is based on recommendations from\nthe http://utf8everywhere.org/[UTF-8 Everywhere Manifesto].\nI would highly recommend that you read that as well to get a better understanding\nof what Unicode is and is not.\n\n== Compilers\n\nThe examples used here have been verified to work on the following compiler/OS\ncombinations:\n\n|===\n| Compiler | Version | Operating System | Status\n| gfortran | 9.3.0   | Linux            | ✅\n|          | 10.3.0  | Windows 10       | ✅\n| ifort    | 2021.5.0| Linux            | ✅\n|===\n\n== Creating and Printing Unicode Strings\n\nFirst, make sure that\n\n* Your terminal emulator is set to UTF-8.\n* Your source file encoding is set to UTF-8.\n\nWith the notable exception of Windows CMD and PowerShell, UTF-8 is probably the default\nencoding in your terminal.\nIf you're using Windows CMD or PowerShell you need to use a modern terminal emulator\nlike https://github.com/microsoft/terminal[Windows Terminal] and follow the instructions\nhttps://akr.am/blog/posts/using-utf-8-in-the-windows-terminal[here].\nIf that's too much hassle you can consider switching to\nhttps://gitforwindows.org/[Git for Windows] instead which will give you a nice Bash\nterminal on Windows.\n\nWith that in place insert unicode characters directly into a string literal in your\nsource code.\nIf you're using Visual Studio Code there's an https://marketplace.visualstudio.com/items?itemName=brunnerh.insert-unicode[extension]\nthat can help you with inserting Unicode characters in your source files.\nUsing escape sequences like `\\u1F525` requires setting special compiler flags and\ndifferent compilers seems to handle this somewhat differently.\nUnless you know for sure that you want to stick with one compiler forever I would\nnot recommend doing this.\n\nIf you're storing it in a variable, use the default character kind _or_ `c_char`\nform `iso_c_binding`.\n*Do not* try to use e.g. `selected_char_kind('ISO_10646')` to create \"wide\" (longer than one byte)\ncharacters.\nFor one thing, Intel Fortran does as of this writing not support this.\nAlso if you're going to pass character arguments to procedures you'll either have to do\nconversion between the default and the `ISO_10646` character kinds or you need to\nhave two versions of each procedure that might need to accept both wide and default\ncharacter kinds.\nAs we will later see, this is never really needed so you will only create extra work\nfor yourself.\n\n*Example:*\n[source,fortran]\n----\nprogram write_to_console\n    implicit none\n    character(len=:), allocatable :: chars\n\n    chars = 'Fortran is 💪, 😎, 🔥!'\n    write(*,*) chars\nend program\n----\n\nThis should output\n\n[source]\n----\n❯ fpm run --example write_to_console\n Fortran is 💪, 😎, 🔥!\n----\n\nAs we can see from in output from the example above the emojis are printed like we\ninserted them in the source file.\n\n\n== Determining the Length of a Unicode String\n\nSome might be confused by that\n\n[source,fortran]\n----\nprogram unicode_len\n    implicit none\n    character(len=:), allocatable :: chars\n\n    chars = 'Fortran is 💪, 😎, 🔥!'\n    write(*,*) len(chars)\n    if (len(chars) /= 28) error stop\nend program\n----\n\noutputs\n\n[source]\n----\n❯ fpm run --example unicode_len\n          28\n----\n\nwhile if we manually count the number of character we see in the string literal\nthen we end up 19 character.\nThis is because in Unicode what we perceive as one character might consist of\nmultiple bytes.\nThis is referred to as a _grapheme cluster_ and is crucial when rendering text.\nDetermining the number of grapheme clusters and their width when rendered on the\nscreen is a complex task which we will not go into here.\nFor more information see the http://utf8everywhere.org/#characters[UTF-8 Everywhere Manifesto]\nand https://hsivonen.fi/string-length/[It's Not Wrong that \"🤦🏼‍♂️\".length == 7].\n\nWe're mainly concerned about storing the characters in memory though, as our\nterminal emulator or text editor takes care of displaying the results on our screen.\nFor this it is useful to think of the character variable as a sequence of bytes\nrather than a sequence of what we perceive as one character.\nWhen `len(chars) == 28` that means that we need 28 elements in our variable to\nstore the string.\n\n== Searching for Substrings\n\nSubstrings can be searched for using the regular `index` intrinsic just like\nstrings with just ASCII characters:\n\n[source,fortran]\n----\nprogram unicode_index\n    implicit none\n    character(len=:), allocatable :: chars\n    integer :: i\n\n    chars = '📐: 4.0·tan⁻¹(1.0) = π'\n    i = index(chars, 'n')\n    write(*,*) i, chars(i:i)\n    if (i /= 14) error stop\n    i = index(chars, '¹')\n    if (i /= 18) error stop\n    write(*,*) i, chars(i:i + len('¹') - 1)\nend program\n----\n\noutputs\n\n[source]\n----\n❯ fpm run --example unicode_index\n          14 n\n          18 ¹\n----\n\nThere is no need for any special handling thanks to the design of Unicode:\n\n[quote,'http://utf8everywhere.org/#textops[UTF-8 Everywhere Manifesto]']\nAlso, you can search for a non-ASCII, UTF-8 encoded substring in a UTF-8 string as if it was a plain byte array—there is no need to mind code point boundaries. This is thanks to another design feature of UTF-8 — a leading byte of an encoded code point can never hold value corresponding to one of trailing bytes of any other code point.\n\nKeep in mind though that what looks like a single character (a grapheme cluster)\nmight be more than one byte long so `chars(i:i)` will not necessarily output the\ncomplete match.\n\n== Reading and Writing to File\n\nReading and writing Unicode characters from and to a file is as easy as writing ASCII text:\n\n[source,fortran]\n----\nprogram file_io\n    implicit none\n\n    ! Write to file\n    block\n        character(len=:), allocatable :: chars\n        integer :: unit\n\n        chars = 'Fortran is 💪, 😎, 🔥!'\n        open(newunit=unit, file='file.txt')\n        write(unit, '(a)') chars\n        write(*, '(a)') ' Wrote line to file: \"' // chars // '\"'\n        close(unit)\n    end block\n\n    ! Read back from the file\n    block\n        character(len=100) :: chars\n        integer :: unit\n\n        open(newunit=unit, file='file.txt', action='read')\n        read(unit, '(a)') chars\n        write(*,'(a)') 'Read line from file: \"' // trim(chars) // '\"'\n        close(unit)\n        if (trim(chars) /= 'Fortran is 💪, 😎, 🔥!') error stop\n    end block\n\nend program\n\n----\n\nThe `open` statement in Fortran allows to one to specify `encoding='UTF-8'`.\nIn testing with `ifort` and `gfortran` however this does not seem to have any\nimpact on the file written.\nSpecifying `encoding` does for example not seem to add a\nhttps://en.wikipedia.org/wiki/Byte_order_mark[Byte Order Mark (BOM)] neither\nwith `gfortran` nor `ifort`.\n\n== Conclusion\n\nWe've seen that using Unicode characters in Fortran is actually not that hard!\nOne need to remember that what we perceive as a character is not necessarily\na single element in our character variables.\nApart from that using Unicode characters in Fortran should really be quite\nstraight forward.\n\n== Contributing\n\nIf you've tested these examples with other compiler/OS combinations than listed\non the top, feel free to submit a https://github.com/plevold/unicode-in-fortran/pulls[pull request]\nand add it to the list.\n\nIf you're having problems with some of the examples posted here feel free open an\nhttps://github.com/plevold/unicode-in-fortran/issues[issue] so that we can\ncollectively keep the information correct and up to date.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplevold%2Funicode-in-fortran","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fplevold%2Funicode-in-fortran","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplevold%2Funicode-in-fortran/lists"}