FlyQuackSwim/Blog: Rewrite MongoDB Logs in Fluentd for MongoDB Storage

All of the current log analysis and reporting systems share a problem (other than log4j): they are resource intensive. For those of us on a limited budget, the minimum requirement for running an elasticsearch/kibana setup is one Digital Ocean droplet (or the equivalent). That's still better than the bad old days of custom (brittle) scripts for log analysis and aggregation. I want to split the middle of the money (elasticsearch/kibana) or time (scripts) dilemma and use the existing tools web developers are already using to create my own custom analysis site.

For the web developer, there are already large chunks of the log analysis system in the toolkit. Working backwards, any web developer can knock out a basic front end that can interact with an API. A log analysis API doesn't need to be too complicated. You may just need to use a few records for system monitoring, or you may want to do traffic analysis. Regardless, any stack should be able to knock out a basic API that provides raw or analyzed records at an endpoint for the front end. To get the data, it would be nice if it were structured and accessible to an API, like in some kind of database (as opposed to the file system where logs usually live). Structured, record, and web development usually add to JSON and MongoDB, but any structured format (YAML, TOML, BespON) or database will work. The last step is structuring log data and storing it in MongoDB, which is where Fluentd enters.

Fluentd can aggregate, structure, and route logging data from the system (syslog) to the app level (all stacks can push data to it) and its resource usage is miniscule. Fluentd stores the log records in MongoDB collections (which can be capped) as JSON documents and is probably already available on your systems if you do any NodeJS stack web development. While fluentd can do most of the structuring and transforming of the log records, MongoDB actually produces log data that can't be directly stored by MongoDB. So, fluentd has to be told how to transform MongoDB records to store in MongoDB.

Most of the fluentd configuration is straightforward. You need a source section to gather the mongoDB data:

<source>
  @type tail
  path /var/log/mongodb/mongod.log
  pos_file /var/log/td-agent/mongod.log.pos
  tag mongodb

  <parse>
    @type json
  </parse>
</source>

adjusting the paths as appropriate. That will load our JSON log records from mongoDB into fluentd. Use whatever tag you like. Once the data is in fluentd, it must be transformed into the correct format. You can look at an entry in your logs or this example (indenting is mine):

{
  "t": {
      "$date": "2020-10-31T22:48:12.218-05:00"
    },
  "s": "I",
  "c":"COMMAND",
  "id":20459,
  "ctx":"initandlisten",
  "msg":"Setting featureCompatibilityVersion",
  "attr": {
    "newVersion": "4.4"
  }
}

We really need something like

{
  "t": "2020-10-31T22:48:12.218-05:00",
  "s": "I",
  "c":"COMMAND",
  "id":20459,
  "ctx":"initandlisten",
  "msg":"Setting featureCompatibilityVersion",
  "attr": {
    "newVersion": "4.4"
  }
}

to make the timestamp accessible. Fluentd's record_transformer makes easy work of this, just make sure the tags match.

<filter mongodb>
  @type record_transformer
  enable_ruby true
  <record>
    t ${record['t']['$date']}
  </record>
</filter>

The line in the record section should be obvious even if you don't know ruby from python. Finally, store the record wherever you want, just make sure the tags match again.

<match mongodb>
  @type mongo
  database my_db
  collection my_collection
  host localhost
  port 27017
  user mongodb_username
  password mongodb_password
</match>

This setup will get you all the logging output from MongoDB, into fluentd, and stored into a mongodb collection. Most of it is not necessary, but once you determine which log records you should analyze, you can begin filtering the rest with additional filter sections before your storing match section.