Monitor your cluster of Tomcat applications with Logstash and Kibana

A month ago I wrote a post about Logstash + ElasticSearch + Kibana potential used all together. Back then the example used was fairly simple, so today’s goal is to see how
one can make the most out of those tools in an IT infrastructutre with real-life problematics. The objective will be to show how to monitor in a central place logs coming from a cluster of tomcat servers.

Problematic : Monitor a cluster of tomcat application

Let’s take a cluster of 3 identical nodes, each of which hosts 3 tomcat applications, this ends up in 9 applications to monitor. In front of these cluster stands a load balancer – so customers can be in any node at any time.

Now if an error happens on the user applications, unless one has a log management system in place, one will need to log into each and every node of the cluster and analyze the logs – I know it can be scripted but you get the point it’s not the optimal way.

This post aims to show how this problem can be tackle using Logstash, Redis, ElasticSearch and Kibana to build a strong – highly scalable and customizable – log management system.

Tools

Here we will apply the following scheme from Logstash website. The only difference is that Kibana will be used instead of the embedded Logstash web interface.

getting-started-centralized-overview-diagram

What does what

  • Logstash: As a log shipper and a log indexer
  • Redis : As a broker – used as a queuing system
  • ElasticSearch : As a log indexer – store and index logs
  • Kibana : As a front-end viewer – a nice UI with useful extra features

Installation

Logstash

Logstash comes as a jar, it is bundled with everything it needs to run.
The jar file is available here

To run it simply execute :
java -jar logstash-1.1.9-monolithic.jar agent -f CONFFILE -l LOGFILE

This will start a logstash instance that will act based on the CONFFILE it has been started with. To make this a bit cleaner, it is recommended to daemonize it so it can be started/stopped/started at boot time with traditional tools. Java-service-wrapper libraries will let you daemonize logstash in no time, traditional initd script works also.

Logstash needs to be installed on both the cluster node (shippers) and the central server where log will be gathered, stored and indexed (indexer)

Redis

Redis is a highly scalable key-value store, it will be used as a broker here. It will be installed on the central location.
Redis installation is pretty straightforward, it has packages available for every main linux distributions. (CentOS user will need to install EPEL first)

  1. Edit /etc/redis.conf, change bind 127.0.0.1 to bind YOUR.IP.ADDR.ESS
  2. Make sure Redis is configured to start at boot (chkconfig/updated-rc.d)

Important : Make sure a specific set of firewall rules is set for Logstash shipper to be able to communicate with Redis

Elastic Search

Unfortunately ElasticSearch can not be found on package repositories on most Linux distributions yet. Debian users are a bit luckier, since the team at elasticsearch.org provide them with a .deb, for other distributions users, installation will need to be manual. ElasticSearch will be installed in the central location.

Get the source from ElasticSearch.org, and as with Logstash I would recommend to use java-service-wrapper libraries to daemonize it.

You need to edit the configuration file elasticearch.yml and uncomment network.host: 192.168.0.1 and replace it to network.host:YOUR.IR.ADDR.ESS. The rest of the configuration needs to be edited based on the workload that is expected.

Kibana

Kibana does not have packages yet, the source code needs to be retrieved from Github of Kibana website itself. Installation is straight forward

  1. wget https://github.com/rashidkpc/Kibana/archive/v0.2.0.tar.gz
  2. tar xzf v0.2.0.tar.gz && cd Kibana-0.2.0
  3. gem install bundler
  4. bundle install
  5. vim KibanaConfig.rb
  6. bundle exec ruby kibana.rb

Main fields to configure in KibanaConfig.rb:

  • Elasticsearch : the URL of your ES server
  • KibanaPort : the PORT to reach Kibana
  • KibanaHost : The URL Kibana is bound to
  • Default_fields : The fields you’d like to see on your dashboard
  • (Extra) Smart_index_pattern : The index Kibana should look into

Kibana will be installed on the central location. Look into `sample/kibana` and `kibana-daemon.rb` for how to daemonize it.

configuration

Tomcat Servers

Here, Logstash will monitor two kinds of logs, the applicactions logs and the logs access

Logs Access

In order to enable logs access in tomcat edit your /usr/share/tomcat7/conf/server.xml and add the AccesLog valve

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
    prefix="localhost_access_log." suffix=".txt" renameOnRotate="true"
    pattern="%h %l %u %t &quot;%r&quot; %s %b" />

Application Logs

Here a library that will output log4j message directly to the logstash json_event format will be used – Special thanks to @lusis for the hard work – so no groking will be required

Configure log4j.xml

Edit the /usr/share/tomcat7/wepapps/myapp1/WEB-INF/classes/log4j.xml

<appender name="MYLOGFILE" class="org.apache.log4j.DailyRollingFileAppender">
    <param name="File" value="/path/to/my/log.log"/>
    <param name="Append" value="false"/>
    <param name="DatePattern" value="'.'yyyy-MM-dd"/>
    <layout class="net.logstash.log4j.JSONEventLayout"/>
</appender>

Logstash file

Shipper

An example of what a Logstash shipper config file could look like

	input {

  		file {
    		  path => '/path/to/my/log.log'
    		  format => 'json_event'
    		  type => 'log4j'
		  tags => 'myappX-nodeX'
  		}

  		file {
    		path => '/var/log/tomcat7/localhost_access_log..txt'
    		format => 'plain'
    		type => 'access-log'
			tags => 'nodeX'
  		}

	}

	filter {

		grok {
			type => "access-log"
			pattern => "%{IP:client} \- \- \[%{DATA:datestamp}\] \"%{WORD:method} %{URIPATH:uri_path}%{URIPARAM:params} %{DATA:protocol}\" %{NUMBER:code} %{NUMBER:bytes}"
		}

		kv {
			type => "access-log"
			fields => ["params"]
			field_split=> "&?"
		}

            urldecode {
                    type => "access-log"
                    all_fields => true
            }

	}

	output {

		redis {
			host => "YOUR.IP.ADDR.ESS"
			data_type => "list"
			key => "logstash"
		}

	}

Indexer

An example of what a Logstash indexer config file could look like

	input {

		redis {
			host => "YOUR.IP.ADDR.ESS"
			type => "redis-input"
			data_type => "list"
			key => "logstash"
			format => "json_event"
		}

	}

	output {

		elasticsearch {
			host => "YOUR.IP.ADDR.ESS"
		}

	}

Testing

  1. Make sure Redis + ElasticSearch + LogStash(indexer) + Kibana are started
  2. Make sure all LogStash (shipper) are started
  3. Go to YOUR.IP.ADDR.ESS:5601 and enjoy a nice structured workflow of logs

The Lucene query language can be used on the header text-box to query/filter results. Kibana will have a new interface soon, that will let one customize an actual dashboards of logs, take a peak at the demo it does look promising.

Find below some screenshot (current version of Kibana) of this what the configuration based on this post provide :

Log access analysis
Screen shot 2013-03-02 at 11.22.48
Application log analysis
Screen shot 2013-03-02 at 12.27.04
Application log analysis -details
Screen shot 2013-03-02 at 12.27.23

You can see that their tags are marked ‘pentaho-node1’ so now it is easy to know what application (pentaho) on which node (node1) did produce the error.

Kibana has some excellent features take time to get to know them.

Conclusion

Last month twitter example was not real-life problems oriented, but with this post one can see all the power behind the use of those tools. Once those tools set up correctly one can gather all the logs of a cluster and explore them, one can easily figure out from where issues are coming from and all at a fingertip. QED

Advertisements

14 Comments on “Monitor your cluster of Tomcat applications with Logstash and Kibana”

  1. I would be interested to know what about the CPU consumption of Logstash under load and what kind of load (line of logs / sec) ?

    • To be honest with you I haven’t done a load test yet of this infrastructure.

      The servers I did deploy it in are not heavy loaded servers so I know for sure I won’t reach limit of Logstash, CPU consumption in my case is really low.
      Yesterday @jordansissel posted a tweet about dealing with 12,400 events / sec (https://twitter.com/jordansissel/status/309717587267297280). I don’t know which inputs he was dealing with and the hardware he was in. But that can give you an idea about the potential load Logstash can support.

      I hope it helps,

  2. Vijay Sharma says:

    I have log files dropped into a folder , I want to use Logstash and Kibana with ElasticSearch without use of broker, like redis here, will it be possible

    • Absolutely. Use the file input to read/parse the files you want, and output the stream in an ElasticSearch instance. I recommend not to use the embedded ES version. Then configure your KibanaConfig.rb to read the ES indexes that stores your logs and you’re ready to view your logs with Kibana without Redis.

      • Vijay Sharma says:

        Thanks for reply, really appreciate , I am doing as suggested, but when I try to read a log file, noting happens(it says caching input file or some thing) , though it responds to stdin input, and elasticsearch output and recording and displaying those events in kibana. but not reading/recording input log files/ not giving and error,

      • Just to be sure, in your file input are you either using ‘start_position => true’ either population the log file while logstash is running ?

      • Vijay Sharma says:

        Yeah, just noticed that .. and it’s working,now issue with log files date, its logging it from current date, not the date specified in the logs, with _groklogparser error, though I just have a blank handler.(It once worked and recorded values from log datetime some how, don’t know how)

  3. Connie says:

    Which version of log4j are you using?

  4. vikas says:

    What happens to the logs written when logstash is down ? Does logstash pick it up when it is started again or are these logs discarded ? Do we need to do change any configuration for this ?

    • It depends on your inputs

      For file input, you have sincedb_path / sincedb_interval, that let you pick up to the last written sincedb timestamp. It does not guarantee that you won’t loss some data though

      For twitter, irc kind of inputs (unkept/unstored logs) I think there is no way to pick up the logs missed while logstash was down.

      For redis/rabittmq etc.. (stored logs) I think logstash will pick it up by itself with no further configuration.

      I haven’t tried it myself for Redis. It needs double checking.

  5. pr0m3 says:

    Thnx for the great post! I do have a question… what is the point of having an indexer in the above code? It doesnt do anything right? Just consuming from redis and redirecting to elasticsearch? Isnt it better then to redirect from redis to es directly? thnx in advance.

  6. […] to be shipped to Redis. I don’t want to the log formats. No json-event format for me – https://spredzy.wordpress.com/2013/03/02/monitor-your-cluster-of-tomcat-applications-with-logstash-an&#8230;. I can’t seem to get the right file config in the […]

  7. […] Posted: March 2, 2013 | Author: Yanis Guenane | Filed under: AdminSys, Linux | Tags: devops, elasticsearch, kibana, Linux, logstash, monitorization, tomcat | 13 Comments […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s