syslog parsing in rsyslog

Written by Rainer Gerhards (2008-09-23)

We regularly receive messages asking why rsyslog parses this or that message incorrectly. Of course, it turns out that rsyslog does the right thing, but the message sender does not. And also of course, this is not even of the slightest help to the end user experiencing the problem ;). So I thought I write this paper. It describes the problem source and shows potential solutions (aha!).

Syslog Standardization

The syslog protocol has not been standardized until relatively recently.The first document "smelling" a bit like a standard is RFC 3164, which dates back to August 2001. The problem is that this document is no real standard. It has assigned "informational" status by the IETF which means it provides some hopefully useful information but does not demand anything. It is impossible to "comply" to an informational document. This, of course, doesn't stop marketing guys from telling they comply to RFC3164 and it also does not stop some techs to tell you "this and that does not comply to RFC3164, so it is <anybody else but them>'s fault".

Then, there is RFC3195, which is a real standard. In it's section 3 it makes (a somewhat questionable) reference to (informational) RFC 3164 which may be interpreted in a way that RFC3195 standardizes the format layed out in RFC 3164 by virtue of referencing them. So RFC3195 seems to extend its standardization domain to the concepts layed out in RFC 3164 (which is why I tend to find that refrence questionable). In that sense, RFC3195 standardizes the format informationally described in RFC3164, Section 4. But it demands it only for the scope of RFC3195, which is syslog over BEEP - and NOT syslog over UDP. So one may argue whether or not the RFC3164 format could be considered a standard for any non-BEEP (including UDP) syslog, too. In the strict view I tend to have, it does not. Refering to the RFC3195 context usually does not help, because there are virtually no RFC3195 implementations available (at this time, I would consider this RFC a failure).

Now let's for a short moment assume that RFC3195 would somehow be able to demand RFC3164 format for non-BEEP syslog. So we could use RFC3164 format as a standard. But does that really help? Let's cite RFC 3164, right at the begining of section 4 (actually, this is the first sentence):

   The payload of any IP packet that has a UDP destination port of 514
   MUST be treated as a syslog message.

Think a bit about it: this means that whatever is send to port 514 must be considered a valid syslog message. No format at all is demanded. So if "this is junk" is sent to UDP port 514 - voila, we have a valid message (interestingly, it is no longer a syslog message if it is sent to port 515 ;)). You may now argue that I am overdoing. So let's cite RFC 3164, Section 5.4, Example 2:

  Example 2

        Use the BFG!

   While this is a valid message, it has extraordinarily little useful
   information.

As you can see, RFC3164 explicitely states that no format at all is required.

Now a side-note is due: all of this does not mean that the RFC3164 authors did not know what they were doing. No, right the contrary is true: RFC3164 mission is to describe what has been seen in practice as syslog messages and the conclusion is quite right that there is no common understanding on the message format. This is also the reason why RFC3164 is an informational document: it provides useful information, but does not precisely specify anything.

After all of this bashing, I now have to admit that RFC3164 has some format recommendations layed out in section 4. The format described has quite some value in it and implementors recently try to follow it. This format is usually meant when someone tells you that a software is "RFC3164 compliant" or expects "RFC3164 compliant messages". I also have to admit that rsyslog also uses this format and, in the sense outlined here, expects messages received to be "RFC3164 compliant" (knowingly that such a beast does not exist - I am simply lying here ;)).

Please note that there is some relief of the situation in reach. There is a new normative syslog RFC series upcoming, and it specifies a standard message format. At the time of this writing, the main documents are sitting in the RFC editor queue waiting for a transport mapping to be completed. I personally expect them to be assigned RFC numbers in 2009.

Practical Format Requirements

From a practical point of view, the message format expected (and generated by default in legacy mode) is:


<PRI>TIMESTAMP SP HOST SP TAG MSG(Freetext)

SP is the ASCII "space" character and the definition of the rest of the fields can be taken from RFC3164. Please note that there also is a lot of confusion on what syntax and semantics the TAG actually has. This format is called "legacy syslog" because it is not well specified (as you know by now) and has been "inherited from the real world".

Rsyslog offers two parsers: one for the upcoming RFC series and one for legacy format. We concentrate on the later. That parser applies some logic to detect missing hostnames, is able to handle various ways the TIMESTAMP is typically malformed. In short it applies a lot of guesswork in trying to figure out what a message really means. I am sure the guessing algorithm can be improved, and I am always trying that when I see new malformed messages (and there is an ample set of them...). However, this finds its limits where it is not possible to differentiate between two entities which could be either. For example, look at this message:


<144>Tue Sep 23 11:40:01 taghost sample message

Does it contain a hostname? Mabye. The value "taghost" is a valid hostname. Of course, it is also a valid tag. If it is a hostname, the tag's value is "sample" and the msg value is "message". Or is the hostname missing, the tag is "taghost" and msg is "sample message"? As a human, I tend to say the later interpretation is correct. But that's hard to tell the message parser (and, no, I do not intend to apply artificial intelligence just to guess what the hostname value is...).

One approach is to configure the parser so that it never expects hostnames. This becomes problematic if you receive messages from multiple devices. Over time, I may implement parser conditionals, but this is not yet available and I am not really sure if it is needed comlexity...

Things like this, happen. Even more scary formats happen in practice. Even from mainstream vendors. For example, I was just asked about this message (which, btw, finally made me write this article here):


"<130> [ERROR] iapp_socket_task.c 399: iappSocketTask: iappRecvPkt returned error"

If you compare it with the format RFC3164 "suggests", you'll quickly notice that the message is "a bit" malformed. Actually, even my human intelligence is not sufficient to guess if there is a TAG or not (is "[ERROR]" a tag or part of the message). I may not be the smartest guy, but don't expect me to program a parser that is smarter than me.

To the best of my konwledge, these vendor's device's syslog format can be configured, so it would proabably be a good idea to include a (sufficiently well-formed) timestamp, the sending hostname and (maybe?) a tag to make this message well parseable. I will also once again take this sample and see if we can apply some guesswork. For example, "[" can not be part of a well-formed TIMESTAMP, so logic can conclude there is not TIMESTAMP. Also, "[" can not be used inside a valid hostname, so logic can conclude that the message contains no hostname. Even if I implement this logic (which I will probably do), this is a partial solution: it is impossible to guess if there is a tag or not (honestly!). And, even worse, it is a solution only for those set of messages that can be handled by the logic described. Now consider this hypothetical message:


"<130> [ERROR] host.example.net 2008-09-23 11-40-22 PST iapp_socket_task.c 399: iappSocketTask: iappRecvPkt returned error"

Obviously, it requires additional guesswork. If we iterate over all the cases, we can very quickly see that it is impossible to guess everything correct. In the example above we can not even surely tell if PST should be a timezone or some other message property.

A potential solution is to generate a parser-table based parser, but this requires considerable effort and also has quite some runtime overhead. I try to avoid this for now (but I may do it, especially if someone sponsors this work ;)). Side-note: if you want to be a bit scared about potential formats, you may want to have a look at my paper "On the Nature of Syslog Data".

Work-Around

The number one work-around is to configure your devices so that they emit (sufficiently) well-formed messages. You should by now know what these look like.

If that cure is not available, there are some things you can do in rsyslog to handle the situation. First of all, be sure to read about rsyslog.conf format and the property replacer and properties specifically. You need to understand that everything is configured in rsyslog. And that the message is parsed into properties. There are also properties available which do not stem back directly to parsing. Most importantly, %fromhost% property holds the name of the system rsyslog received the message from. In non-relay cases, this can be used instead of hostname. In relay cases, there is no cure other than to either fix the orginal sender or at least one of the relays in front of the rsyslog instance in question. Similarly, you can use %timegenerated% instead of %timereported%. Timegenerated is the time the message hit rsyslog for the first time. For non-relayed, locally connected peers, Timegenerated should be a very close approximation of the actual time a message was formed at the sender (depending, of course, on potential internal queueing inside the sender). Also, you may use the %rawmsg% property together with the several extraction modes the property replacer supports. Rawmsg contains the message as it is received from the remote peer. In a sense, you can implement a post-parser with this method.

To use these properties, you need to define your own templates and assign them. Details can be found in the above-quoted documentation. Just let's do a quick example. Let's say you have the horrible message shown above and can not fix the sending device for some good reason. In rsyslog.conf, you used to say:


*.* /var/log/somefile

Of course, things do not work out well with that ill-formed message. So you decide to dump the rawmsg to the file and pull the remote host and time of message generation from rsyslog's internal properties (which, btw, is clever, because otherwise there is no indication of these two properties...). So you need to define a template for that and make sure the template is used with your file logging action. This is how it may look:


$template, MalfromedMsgFormater,"%timegenerated% %fromhost% %rawmsg:::drop-last-lf%\n"
*.* /var/log/somefile;MalformedMsgFormatter

This will make your log much nicer, but not look perfect. Experiment a bit with the available properties and replacer extraction options to fine-tune it to your needs.

Wrap-Up

Syslog message format is not sufficiently standardized. There exists a weak "standard" format, which is used by a good number of implementations. However, there exist many others, including mainstream vendor implementations, which have a (sometimes horribly) different format. Rsyslog tries to deal with anomalies but can not guess right in all instances. If possible, the sender should be configured to submit well-formed messages. If that is not possible, you can work around these issues with rsyslog's property replacer and template system.

I hope this is a useful guide. You may also have a look at the rsyslog troubleshooting guide for further help and places where to ask questions.

[manual index] [rsyslog site]