Your log files are binary already. What do you think 'cat' is, other then a binary log reader?
Binary log files are as easy to read as ASCII with the right translator. Tagged binary formats are easily and trivially self-describing, and as resistant to corruption as ASCII.
The problem is, people think they'll be able to understand corrupt ASCII logs, but in reality basically never deal with them. They deal with truncated logs - which, well guess what, can be handled just as easily as for binary logs.
Given that journald can be configured with a log size of 0, and to output everything straight to syslog, this is such a non-complaint it's absurd. Especially when the hot thing in syslog management is usually to stream everything into a database of some kind to make it easy to search!
EDIT: Oh yeah, and this is assuming you're not compressing your logs. Which you probably are. In which case, they're already a very complicated, very corruptible non-binary format for which you have to use a fairly complicated read app (zcat) to access.
> Tagged binary formats are easily and trivially self-describing, and as resistant to corruption as ASCII.
The advantage of human-readable logs is that the semantics and rules of the language it's written in serve as a built-in forward error-correction measure. It's easy to see that the string "the qu()&!`8zwn fox jumps over the la" is a corruption and truncation of "the quick brown fox jumps over the lazy old dog", especially if you're already used to seeing the original string under normal operation. By comparison, it's much harder to tell whether or not an arbitrary byte string is corrupt, and journald does not make use of any forward error-correction measures to assist in this.
> Given that journald can be configured with a log size of 0, and to output everything straight to syslog, this is such a non-complaint it's absurd.
If I'm using journald in this configuration, then why does journald need to exist at all? It's literally dead weight, and adds another point of failure to my logging facility.
> EDIT: Oh yeah, and this is assuming you're not compressing your logs. Which you probably are. In which case, they're already a very complicated, very corruptible non-binary format for which you have to use a fairly complicated read app (zcat) to access.
When compressed log corruption recovery is a concern, tools like parchive [1] are used instead of gzip/zcat.
Good points. As for compression: I think most people compress logs post-rotation. So you'll be unlikely to have a corrupted compressed log. Either the file will be compressed, then then original removed, or compression will fail, and the original will remain untouched.
It's more of an issue with whatever is written to disk by running processes. As you point out, it's debatable whether or not unicode/ascii vs "binary" is a sensible distinction ... I'd say taking a hex-editor to a mangled but mostly ascii text file is easier than some (any) binary format, but perhaps you know of some easy-to-use tool that will take a file-description and give you back data? Things like figuring out integer encoding and offset for timestamps across different files etc... is much easier with a rather redundant ascii timestamp than some binary number?
If your filesytem eats your files... well then that's a different problem.
Exactly my point: this isn't a complex database format. It's a tagged binary format, written in an append only fashion. So you're only going to lose data if the tool decides to write bad data, but that's just as true of a text log format - your logs are useless if all those numbers don't actually relate to the values they claim to.
So any tool which can read a journald journal can happily do so until it hits hard corruption - which is about as well as you ever do with syslog. I'll gladly trade an unlikely and really narrow recovery profile, for smaller, easily machine readable, well-defined log files (in the sense that, to write an entry, someone wrote down the exact struct somewhere, and had to keep using it that way. No regex's which fail in some case which happens once every 1 million lines of log file). Especially since the compatibility layer is just "forward text logs to syslog".
Fair enough. But you still then need to fit an additional tool into your recovery image. As long as it is possible to do with a small static binary that can be expected to be available (say built into a version of busybox) I don't have a great problem with it.
Binary log files are as easy to read as ASCII with the right translator. Tagged binary formats are easily and trivially self-describing, and as resistant to corruption as ASCII.
The problem is, people think they'll be able to understand corrupt ASCII logs, but in reality basically never deal with them. They deal with truncated logs - which, well guess what, can be handled just as easily as for binary logs.
Given that journald can be configured with a log size of 0, and to output everything straight to syslog, this is such a non-complaint it's absurd. Especially when the hot thing in syslog management is usually to stream everything into a database of some kind to make it easy to search!
EDIT: Oh yeah, and this is assuming you're not compressing your logs. Which you probably are. In which case, they're already a very complicated, very corruptible non-binary format for which you have to use a fairly complicated read app (zcat) to access.