Help improve Unicode support on Windows!

#1

Situation

For many Windows users, interesting characters like do not display properly in the console due to the various details and defaults of Windows, as reported here.

In the current release, characters like and are turned into + and X on Windows to make things look acceptable.

It seems like some languages (like Python?) have work arounds for getting the interesting characters to display better, but at this time, Haskell does not have something similar.

Possibility

I periodically ask around in the Haskell IRC if anything has changed, and today I got some interesting information! I have distilled it into the following Haskell program:

import System.IO (hPutStrLn, hSetEncoding, stdout, utf8)
import System.Win32.Console (setConsoleCP)

chars :: String
chars =
  "● ✗ ─ ┘ ┤ ┬ ↓"

main :: IO ()
main =
  do  putStrLn chars            -- this should look bad!
      setConsoleCP 65001        -- weird fix for Windows?
      hSetEncoding stdout utf8  -- apparently this is needed?
      hPutStrLn stdout chars    -- this may look okay?

Warnings:

  • I have not tested this code!
  • You will need to have the Win32 package installed, which gives access to SetConsoleCP.
  • This program probably only compiles on Windows.

Request

If you have Haskell set up on your Windows computer, can you try compiling and running this program? What is the output? I am hoping to see the first line look bad, and the second line look nice. But let me know whatever it is!

If you mess around with this, please ask any questions in the #elm-dev channel on Slack! I put a message there linking to this post, so we can talk on the subcomments of that. I want to keep this thread pretty clean, focusing on results like:

Using Windows 10 and cmd.exe it prints out:

ΓùÅ ΓùÅ ΓùÅ ΓùÅ ΓùÅ ΓùÅ ΓùÅ
● ✗ ─ ┘ ┤ ┬ ↓


Note: It is possible to make this code work cross platform with various weird Haskell things, but I wanted to avoid getting into that at this stage. Just want to confirm that we can get nice characters at this stage!

1 Like
#2

Using Windows 10 and cmd.exe (as well as Powershell) it results in:

: commitBuffer: invalid argument (invalid character)

If I run it in interactive mode (ghci):

*** Exception: : hPutChar: invalid argument (invalid character)

If I remove the first line it prints out:

ΓùÅ Γ£ù ΓöÇ Γöÿ Γöñ Γö¼ Γåô

#3

For the unmodified version, I get the same output as @gigobyte. Here is a version of the program that worked for me:

module Main where
import System.IO (hPutStrLn, hSetEncoding, stdout, utf8)
import System.Win32.Console (setConsoleCP, setConsoleOutputCP)

chars =  "● ✗ ─ ┘ ┤ ┬ ↓"

main = do
    -- putStrLn chars -- can't do this yet, have to set the encoding first
    hSetEncoding stdout utf8 -- This has to be the first action, otherwise you get the "invalid character" error
    setConsoleOutputCP 65001 -- The "Output" variant is for printing, the other one is for console input. Probably want to set both
    hPutStrLn stdout chars -- This now prints unicode 🎉!

Interestingly, even though windows reports being in code page 850 after the program exits, the console still appears to be using the UTF-8 codepage. This means that after a successful run of this program, even a “broken” version that doesn’t call setConsoleOutputCP will print the correct string. To “break” the console again, I had to explicitely do this:

chcp 850

I have tested this on the latest update of Windows 10 Professional:

  • ConEMU (a better console emulator for windows): ● ✗ ─ ┘ ┤ ┬ ↓ (everything works)
  • Built-In Windows cmd.exe and powershell.exe: ● ✗ ─ ┘ ┤ ┬ ↓ BUT: the default Consolas font does not support this character, and these console emulators do not have nice fallbacks in such cases. The ✗ is printed as �!
  • VS Code Built-In Terminal: ● ✗ ─ ┘ ┤ ┬ ↓ (everything works)

Unfortunately, I don’t have Git Bash installed.

#4

Interesting results! Thank you!

@jreusch, based on your information, it sounds like we could do something like this:

import System.IO (hPutStrLn, hSetEncoding, stdout, utf8)
import System.Win32.Console (getConsoleOutputCP, setConsoleOutputCP)

main :: IO ()
main =
  do  cp <- getConsoleOutputCP
      setConsoleOutputCP 65001
      hSetEncoding stdout utf8
      hPutStrLn stdout "● ✗ ─ ┘ ┤ ┬ ↓"
      setConsoleOutputCP cp

I assume that would set the code page back to the initial setting.

You mentioned that in cmd.exe and powershell.exe that some characters are not supported by the default font. Which ones exactly? If we can get a fraction of these characters, maybe it’s still worth pursuing this direction.

(The fallback is to do the ASCII art with only ASCII characters on Windows. I went through and converted some instances, and they work alright. Maybe that has its own charm!)

#6

(On my previous post, I forgot to change some settings back. So most of the output/screenshots are not what you would see on a vanilla Windows system)

Of the symbols you posted, only the ✗ seems to not work. I’ve tried a bunch of other characters people usually seem to use, but most of them appear to be broken, except for some exceptions:

● ─ ┘ ┤ ┬ ↓      ◌ ?⃝ ● ․ ─ … › ♥ ↑ ↓ ← → ?⃝ ½ ⅓ ¼ ⅕ ⅙ ⅛ ⅔ ⅖ ¾ ⅗ ⅜ ⅘ ⅚ ⅝ ⅞ 

On non-cmd.exe/powershell.exe terminals, most things seem to work fine. (the terminal built in to VS Code especially seems to have really good unicode support).

1 Like
closed #7

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.