db-parser: high speed log message parser
During the development of SSB I have created a “small” syslog-ng extension for high speed log message parsing. Originally we wanted to include a logcheck like functionality in ssb, but the regexp based message matching/classification seemed slow to me. Logcheck takes an artificial ignorance approach to log handling and has a pattern list to match “known” or “normal” messages and provide the administrator with the rest. However regular expressions are hard to write (especially in a proper way), hard to latter understand and does not really scale if you have hundreds of regexps. (Generally the problem with log messages that they usually do not have any well defined structure and are meant mainly for humans and not for computer processing. Though recently some vendor try to come up with XML based messages with limited success so far.)
Instead of using regular expression db-parser was born and it was released in the open-source edition of syslog-ng 3.0. (3.0 introduced the parser concept as well, see Bazsi’s port on parsers and on the db-parser.) The idea was to provide a high speed non-regexp based message classification system which is based on a pattern database (therefore the name db-parser). The algorithm is based on a longest prefix match radix tree structure with capability to utilize parsers to match variable parts of messages. The patterns (consisting of literal and parser parts) are put into a radix tree and matching is a simple search for the longest prefix in the tree. In case of a match a simple class can be assigned to the message and the class could be latter used in filters.
Parsers are important not only because of matching variable parts in the message, but they are also capable of parsing out parts of the message and store in variables which could be referenced latter in filters or in template macros. A possible usage would be to match on firewall messages, parse out the message details (as IP addresses, ports etc.) and store the extraced fields in an SQL database column by column. (syslog-ng also has native SQL support as well.)
It took a bit of time till the first adopters started to use db-parser, but the time has come. Last week I was asked on the syslog-ng mailing list on some db-parser related questions. I am very happy about that. Specially cause my idea is to setup a community website for db-parser where anyone can share his/her own patterns with other to provide quality patterns covering a wide range of applications for anyone interested.