Hello, I've tried to find someone else using OpenBSD in various places for a while now, but with no success, so I'm hoping someone will read this.
I'm wondering what your output is from file(1) on a file you know has text encoded as UTF-8.
On my system (7.3-stable) the output is "Non-ISO extended-ASCII text", and I'm trying to figure out if this is how it should be, or if I did something wrong setting up the system.
So, if you have a computer with OpenBSD and a minute to spare, could you try running file(1) on a UTF-8 file and see if it identifies it as UTF-8 or "Non-ISO extended-ASCII text"?
static int
text_try_test(const void *base, size_t size, int (*f)(u_char))
{
const u_char *data = base;
size_t offset;
for (offset = 0; offset < size; offset++) {
if (!f(data[offset]))
return (0);
}
return (1);
}
const char *
text_get_type(const void *base, size_t size)
{
if (text_try_test(base, size, text_is_ascii))
return ("ASCII");
if (text_try_test(base, size, text_is_latin1))
return ("ISO-8859");
if (text_try_test(base, size, text_is_extended))
return ("Non-ISO extended-ASCII");
return (NULL);
}
So file(1) is not capable of saying if a file is UTF-8 right now. There is some other file (/etc/magic) which can help to determine if a text file is UTF-7 or UTF-8-EBCDIC because those need a BOM but as you said UTF-8 does not need a BOM. So it looks like we are stuck here :)