Collaboratively work on a Puppet module. Vagrant + r10k.

Once a puppet module is in the forge it is quite easy to share it with other people to try it out. But until then, it can be somewhat cumbersome. They are several reasons why releasing a module to the forge can be delayed :

  • a dependent module has not been officialy released yet
  • a pull request being pending on another module one is relying on.
  • a pull request has been merged but one needs to wait for the maintainer to actually do a new release

All those reasons, make the collaboration on a puppet module a litlle bit harder than a simple puppet module install.

In order to tackle this issue and make collaboration / demo of a puppet module easier let’s see how by using Vagrant and r10k this problem can be solved painlessly and in a versionned controlled way.


r10k is a project that allows one to specify her puppet module dependencies on a file (Puppetfile) and make a proper deployment on a puppet master or a local folder.

Those dependencies can be expressed either as forge puppet module version number or git repository. When specifying git, the checkout process can be bound on master, a specific branch, a specific commit, etc…

mod 'puppetlabs/ntp', '3.0.3'

mod 'stdlib',
:git => 'git://'

Vagrant (provisioner)

I am sure you’ve heard about vagrant, at least once. It’s a great project that makes collaboration easier by scripting the boot + provisionning of a VM. No more excuses like “it works on my machine”.
By simply using the same Vagrant file and the same base box, two users are sure to have the same result. One of the great feature of vagrant is the provisionning on VM creation.
This is the feature that will be used here. To make the point the following Vagrant file will be used :


Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| = "MYBASEBOX"
config.vm.provision "puppet" do |puppet|
puppet.module_path = 'modules'


By default, vagrant will look for a manifet to run in ./manifests/default.pp, hence it is not necessary to specify it here since we will be dropping the file in the correct path.
With the afro mentionned Vagrantfile, vagrant will boot a VM using the MYBASEBOX box and run the manifest/default.pp manifests using the modules/ folder located in the samed directory as your vagrantfile as –modulepath option to puppet. (a.k.a “the glue”)

The script, or however you want to call it, will simply create the appropriate folders and run the r10k install command to retriveve the modules specified in the Puppetfile.
It will then copy an example manifest into manifests/default.pp for vagrant to run it. This is what a basic could look like :


if [[ ! `gem list r10k` ]];then
gem install r10k
mkdir -p modules
mkdir -p manifests
PUPPETFILE=./Puppetfile PUPPETFILE_DIR=modules r10k --verbose 3 puppetfile install
cp modules/path/to/example.pp manifests/default.pp

It is up to you to customize it to your need.
In this example a file called example.pp was located at the root of the module but it is up to you to have it in an examples folder, simply adapt the script accordingly.

Show time

Long story short, simply run the following command and see the magic happen

vagrant up


By having the Puppetfile and the script in a git repository, one can easily and repeatedly share the avancement on a given puppet module at any time, without having to wait for upstream release. QED.

Enable network namespaces in CentOS 6.4

By default, CentOS 6.4 does not support network namespaces. If one wants to test the new virtualization platforms (Docker, OpenStack, & co…) on a CentOS server, all features won’t be available.
For OpenStack for example, Neutron won’t work as expected, since it actually needs network namespace to create networks,

Fortunately, RedHat – through RDO – provides a kernel that get this feature backported.

So, before updating the kernel, if one runs :

#> ip netns list

s/he will be presented with the following error message : Object “netns” is unknown, try “ip help”.

The following steps needs to be realized to install the new kernel and enable the network namespace feature

#> yum install -y
#> yum install kernel iproute
#> reboot

And that’s it. Really.

Now one can run

#> ip netns add spredzy
#> ip netns list

spredzy should get displayed.

If everything is working one should have the following kernel and iproute packages installed :


Note : the openstack mention for kernel and netns for iproute

The Foreman PXE provisioning with Libvirt

More than just a Puppet management interface The Foreman can handle the whole lifecycle of servers, from their creations and provisioning (pxe + kickstart/preseed) to their management (puppet). Today’s blog post will highlight how to use the provisioning feature of The Foreman using Libvirt DHCP server (dnsmasq) for local testing purpose.

Pre Requisite

  • An instance of a VM running foreman on libvirt, for this post version 1.3.0 of The Foreman is used, and CentOS 6.4 will be deployed.

Create the Operating System (The Foreman)

The Operating System

In the first time, simply fill the four first field and click submit. We will get back to it at a later point.

Path : More -> Provisioning -> Operating Systems -> New Operating System

Edit OS

Edit OS

The Architecture

Add an architecture one will be supporting for a set of OSes

Path : More -> Provisioning -> Architectures -> New Architecture

Edit Architecture

Edit Architecture

The Installation Media

For our case, the CentOS installation media already exist, one still have to click on CentOS and specify RedHat as Operating System family.

If you have a local mirror of CentOS repositories you could simply make the path points to it, installation will be much faster.

Path : More -> Provisioning -> Installation Media

Edit Installation Media

Edit Installation Media

The Partion Table

A RedHat default partition tables is already present, for the purpose of the demo we will be using it but you might want to create your own one. Do not forget to specify the Operating System Family.

Path : More -> Provisioning -> Partition Tables

Edit Partition Tables

Edit Partition Tables

The Templates

The provisiong template section is where one defines its kickstart/preseed, PXE, gPXE, etc… scripts.

One can define snippets that can be embedded within scripts.

For the demo purpose we will be using two pre-existing scripts

  • Kickstart Default PXELinux (PXELinux)
  • Kickstart Default (provision)

Once one clicks on the Template, one needs to go the the Association tab on the presented page to associate it with the proper OS. Here it needs to be done twice for the Kickstart Default PXELinux and for the Kickstart Default scripts.

Path : More -> Provisioning -> The Provisioning  Templates

Edit Provisioning Template

Edit Provisioning Template

The Operating System

And back to the Operating System to bind it all together.

Path : More -> Provisioning -> Operating Systems -> CentOS 6.4

First you should be presented with the following page, pick the right options (Architecture, Partition Tables, Installation Media) for your OS

Edit OS - OS

Edit OS – OS

Now go to template and associate the template accordingly

Edit OS - Templates

Edit OS – Templates

You can now save the OS.

Create the domain (The Foreman)

Here nothing fancy, simply fill up what is prompted. In the current scenario we don’t use The Foreman as a DNS.

Path : More -> Provisioning -> Domains -> New Domain

Edit Domain

Edit Domain

Create the Subnet (The Foreman)

Here the Network Address is the one from your libvirt’s dnsmasq configuration. Normally you can guess if from a simple ifconfig eth0, else on the host run virsh net-dumpxml default, assuming you run the default network. Same thing applies for the Network Mask.

Select the appropriate domain (cf. Create The Domain) and then the most important make sure the smart proxy name is selected in the TFTP Proxy box.

Path : More -> Provisioning -> Subnets -> New Subnet

Edit Subnet

Edit Subnet

Create the VM with PXE boot (Libvirt)

Create the New VM with a PXE boot

node1 - PXE

node1 – PXE

For now you can stop the VM since the DHCP server is not configured. Please note the MAC address of the Virtual Machine, it will be needed on the later section

Configure dnsmasq for IP attribution and PXE boot (Libvirt)

Note your foreman VM and your node1 VM MAC addresses.

Stop your foreman VM now.

1. Destroy the network

virsh net-destroy default

2. Edit the current network to assign static ip

virsh net-edit default


<ip address='' netmask=''>
    <range start='' end='' />


<ip address='' netmask=''>
    <range start='' end='' />
    <host mac='52:54:00:CB:C3:C6' name='foreman' ip='' />
    <host mac='52:54:00:89:2A:7E' name='node1' ip='' />
    <bootp file='pxelinux.0' server='' />

3. Restart the network

virsh net-start default

What is being done here at step 2,  is a static assignement of IP addresses by the DHCP server and the configuration of the PXE boot.

Static Assignement of IP address

<host mac='52:54:00:CB:C3:C6' name='foreman' ip='' />

Here we tell dnsmasq that device with MAC address ’52:54:00:CB:C3:C6′ will always be assigned ip ‘’

PXE Boot Configuration

<bootp file='pxelinux.0' server='' />

He we tell devices that wish to do PXE boot, to get the file pxelinux.0 on the tftp server running on

You can now start the foreman VM, not node1 yet.

Create the Host (The Foreman)

Here fill up the information as needed, the specifics to PXE provisioning are the Network and Operating System tabs.

  • In the Network tab, fill up the MAC address, the configured domain, subnet and the IP Address assigned in DHCP server.
  • In the Operating System tab, select the Operating System you want your VM to be. (cf. Configure the Operating System)

Path : Hosts -> New Host

Edit Network Host

Edit Network Host

Edit Operating System Host

Edit Operating Syste

Start the VM (Libvirt)

Simply start the node1 VM, it will be assigned the static IP address  and will retrieve the pxelinux.0 from the foreman server as specified in the DHCP server. It might take some time while the installation is processing.

Once the VM automatically rebooted, one needs to go to the foreman > hosts page and will see that the node1 is in a ‘No Changes’ state, meaning build was successful, puppet connected. The VM is now fully managed by The Foreman.


One can configure as many OSes as one wants with fully configurable kickstart/preseed scripts, themselves dynamically parametrizable. As of today, The Foreman is a solid solution to manage the whole lifcycle of servers, from creation to provisioning to management, providing the user with details – filtrable – reports of what is going on. On a personal note I would say that if you are managing puppet servers and you are not using The Foreman, you are doing it wrong. QED.


Samba standalone + OpenLDAP

On the web there are many tutorials about setting a Samba server as one’s Domain Controller (DC), but really a few about setting a Standalone Samba server relying on an external OpenLDAP for authentication. Actually quite a simple process, it needs a lot of configuration on both ends, the Samba server and the OpenLDAP one, before it can be functionnal.

This post shows how to set up a Samba 3.6 server to rely on an external OpenLDAP 2.4 server, both being hosted on a CentOS 6.4

The Samba Server

Authorize the use of LDAP system-wide

In order for the Samba server to be able to rely on then OpenLDAP one, the use of LDAP needs to be enabled system-wide. To do so the authconfig configuration needs to be updated the following way

authconfig --enableldap --update

This simply edits the /etc/nsswitch.conf file and append ldap on passwd, shadow, group, netgroup and automount items

Install the samba packages

Simply run

yum install samba samba-common

Note : This article is about Samba 3.6 version and not Samba4. So do install the samba* packages and not the samba4* packages.

Copy and install the Samba schema in the OpenLDAP server

Note : Since those steps need to be done before the smb.conf configuration, this section shows here, even if logically it belongs to “The OpenLDAP server”

By default, the OpenLDAP server doesn’t speak the Samba language. One needs to add samba LDAP schema to it. From the Samba server, once the samba packages installed simply copy the samba.ldif file located at /usr/share/doc/samba-3.6.9/LDAP/samba.ldif to your OpenLDAP cn=schema directory

scp /usr/share/doc/samba-3.6.9/LDAP/samba.ldif user@openldap:/etc/openldap/slapd.d/cn=config/cn=schema

On the OpenLDAP server, the file needs to be renamed with the pattern – cn={X}samba.ldif – where X represents the highest number available + 1. On a default OpenLDAP installation, the highest number available is 11 (cn={11}collective.ldif) thus, the samba.ldif file needs to be renamed cn={12}samba.ldif

Edit the cn={12}samba.ldif file at line 1 and 3 so it look like this

dn: cn={12}samba.ldif
objectClass: olcSchemaConfig
cn: cn={12}samba.ldif

Finally, restart the slapd service so the new schema can be loaded correctly.

The smb.conf

In Samba there are 3 backends storage available per default.

  • smbpasswd – it is deprecated,
  • tdbsam – the one enabled by default.  It relies on a local database of user, filled via the smbpasswd -a command
  • ldapsam –  It relies on an external LDAP directory

To make your standalone Samba server rely on OpenLDAP simply change this chunk of code

security = user
passdb backend = tdbuser


security = user
passdb backend = ldapsam:ldap://
ldap suffix = dc=wordpress,dc=com
ldap admin dn = cn=admin,dc=wordpress,dc=com
  • ldap suffix : the suffix of your DIT
  • ldap admin dn : This is optional. If the OpenLDAP server denies anonymous request, then one needs to specify an admin dn entry.  Also if your LDAP tree do not have a SambaDomain entry yet, specifying the ldap admin dn configuration will create it automatically.  If using ldap admin dn, one needs to specify the admin dn password running smbpasswd -W

Save and exit the file, then restart the smb service. After few second one can run net getlocalsid and will be presented with a line looking like

SID for domain SAMBA-SERVER is: S-1-5-21-2844801791-3392433664-1093953107

If you set ldap admin dn in the smb.conf, the SambaDomain was created automatically and net getlocalsid returns this value, if you setted it manually net getlocalsid should return your your SambaDomain informations

Set samba to start automatically at boot time – chkconfig samba on – and the Samba server is all set to receive request from LDAP existing users.

The OpenLDAP server

In order for an OpenLDAP server to be Samba aware, some attributes needs to be added to the appropriate entryies. Make sure the samba schema has been loaded into OpenLDAP, as explained earlier.


This entry can be automatically created  by the Samba server – if one wants  – and contains general informations about the Samba behavior. The most important information that can be found here is the SID, Security IDentifier for the domain. It will be needed for the configuration of Samba Groups and Users entries.


This is an auxiliary objectClass  that should be added to all the posixGroup entry that one wants to work with in Samba. It has only  two mandatory attributes, the SambaSID that is a uniqe ID within the SambaDomain ans the SambaGroupType, that define the type or the group.

The SambaSID is composed of the SID + RID

  • SID : From the SambaDomain entry
  • RID : Relative IDentifier, a unique id within the SambaDomain

The defined SambaGroupType are :

  • 2: Domain Group
  • 4: Local Group (alias)
  • 5: Builtin


This is probably the most touchy, yet scriptable part. This is the auxiliary objectClass that should be added to all the posixAccount entry that one wants to work with in Samba. It contains Samba credentials. For Samba to authenticate a LDAP hosted user, the latter needs to have the the following attributes set

  • SambaAcctFlag : define user type (permissions)
  • SambaLMPassword : The LanMan password
  • SambaNTPassword : The NT password
  • SambaPwdLastSet : Timestamp of the last password update
  • SambaSID : The unique identifier within the SambaDomain

To obtain those informations , one can run this script , this needs the perl module Crypt-SmbHash to be installed

Usage : ./script username password

This will give the following outputs

:0:47F9DBCCD37D6B40AAD3B435B51404EE:82E6D500C194BA5B9716495691FB7DD6:[U          ]:LCT-4C18B9FC

  |            LMPassword          |         NTPassword             |   AcctFlags |

For the SambaSID value, refere to the SambaGroupMapping section the same logic apply here.

Once the SambaDomain, SambaGroupMapping and SambaSamAccount applied where it has to, the Samba server is ready to authenticate against the OpenLDAP server


Making a standalone Samba server rely on an external OpenLDAP , is not a difficult process, but it does involve quite a lot of configuration. In this article, neither the IPtables or the SElinux side of things has been adressed, but you should definetly set them up accordingly.  Go ahead add people on your DIT and see how they can access their own Samba Share. QED

Effective backup/recovery process for OpenLDAP

Making sure to never lose any piece of data is a really difficult task. A point-in-time backup (snapshot), in a permanently living and changing environment does not match data loss-less expectations.

In today’s post the focus will be put on OpenLDAP backup/recovery process in order to never lose a bit of data – well maybe the last transaction in case of a power outage.

Most online resources refer to the OpenLDAP backup/recovery process as :

  • For Backup : running a slapcat command and sending the output to a backup server in a cron job
  • For Recovery : getting the last meaningful backup from the backup server and reload it with a slapadd command

Simple isn’t it ? Well it is simple but it simply does not prevent from important data loss. Let’s highlight two cases that demonstrates the limit of this backup plan.

Case 1

Let’s take a moderately busy service that inserts an average of 1,000 new daily users in its dictionary. There are backups made (using the slapcat command) every day at midnight. Now, for some reasons one day at 8.00pm, a hard drive crashes (no RAID) or the filesystem got corrupt or the reason you want to come up with… It is time for recovery. We set a new VM or a new drive, set OpenLDAP again, get back the last meaningful backup and load it with a slapadd command. OpenLDAP server is back to its yesterday state but what about the 900 entries that got inserted today ? Well simply gone. That is why you must have a redundant set of OpenLDAP servers via replication. But replication is not a backup plan in itself.

Case 2

For precaution you set up a master/slave schema (a.k.a Consumer/Provider in the LDAP terms). So even if the main OpenLDAP server crashes you do have an up to date copy. Now since error is human, if an employee inadvertently removes an important set of data, this change will be replicated to all your slaves OpenLDAP servers and the data won’t be recoverable. Recovering yesterday backup will leave you in the same state as Case 1 and data would have been lost.


Design of an infrastructure effective for backup/recovery process

Design of an infrastructure effective for backup/recovery process

To be able to almost never lose a bit of OpenLDAP data, the infrastructure to deploy will heavily rely on the accesslog module provided by OpenLDAP.

The accesslog overlay is used to keep track of all or selected operations on a particular DIT (the target DIT) by writing details of the operations as entries to another DIT (the accesslog DIT). The accesslog DIT can be searched using standard LDAP queries. Accesslog overlay parameters control whether to log all or a subset of LDAP operations (logops) on the target DIT, to save related information such as the previous contents of attributes or entries (logold and logoldattr) and when to remove log entries from the accesslog DIT

Definition from

Accesslog are mainly used for replication/audit purpose. In the above schema, our slaves will never be master of any other OpenLDAP server, they do use accesslog as a real-time accesslog backup in case the Master OpenLDAP server becomes unavailable for any reason.

Backup Process

As simple as it is described by most resources out there, the backup process will be a slapcat command – run as a cron job – of the needed DIT and their relative Accesslog DIT

#> slapcat -n 2 >  maindit-bk.ldif
#> slapcat -n 3 > maindital-bk.ldif

Recovery Process

This is how the recovery process would work :

  1. Load the last meaningful backup of the needed DIT with the command
  2. Load the accesslog from either the backup or the slave accesslog – which ever fit the most – do not forget to clean the accesslog if you are trying to recover an erroneous action
  3. Set the DIT to be a consumer of the freshly loaded accesslog


Step 1 : Simulate data loss

#> service slapd stop
#> slapcat -n 2 > maindit-backup.ldif
#> service slapd start
#> ldapadd -x -w 'test' -D 'cn=Manager,dc=domain,dc=com' -f user.ldif
#> service slapd stop
#> slapcat -n 3 > maindit-accesslog-backup.ldif

At this time, there are two backup files :

  • the maindit-backup.ldif that has everything but the last entry
  • the maindit-accesslog-backup.ldif that do have the addition of the user.

Step 2 : Recovering a clean OpenLDAP server

  1. Install a new VM with the appropriate package and configuration [if necesary only]
  2. If you are using a corrupted OpenLDAP server, move all the dbd file of your corrupted database (mv /var/lib/ldap/{yourdbdname}/*.dbd /backup/ldap/{yourdbname})
  3. Enable accesslog and syncprov modules
  4. Reload the needed DIT with slapadd
  5. Create an Accesslog db that will be used as provider
  6. Reload the accesslog db with its backup
  7. Configure syncrepl on the main DIT to be a consumer of the accesslog provider
  8. Restart slapd

At this time you have your OpenLDAP server being back up-to-date data wise and no data has been lost.


Not that simple right ? It needs a bit more than 2 lines of shell scripts. Long time observed behavior is that people/company do backups but do not test recovery. They are tested when the backup plan is created but then left aside and almost never used. Some company, on the other side takes recovery to its extreme and deploy last night backup to production every day. This way the recovery process is well tested and they don’t fear failure. Either way one decides to go, make sure  to always have a data loss-less backup/recovery plan, an up-to-date documentation that goes along with it, and your nagios’ check_ldap plugin up and running. QED

Monitor your cluster of Tomcat applications with Logstash and Kibana

A month ago I wrote a post about Logstash + ElasticSearch + Kibana potential used all together. Back then the example used was fairly simple, so today’s goal is to see how
one can make the most out of those tools in an IT infrastructutre with real-life problematics. The objective will be to show how to monitor in a central place logs coming from a cluster of tomcat servers.

Problematic : Monitor a cluster of tomcat application

Let’s take a cluster of 3 identical nodes, each of which hosts 3 tomcat applications, this ends up in 9 applications to monitor. In front of these cluster stands a load balancer – so customers can be in any node at any time.

Now if an error happens on the user applications, unless one has a log management system in place, one will need to log into each and every node of the cluster and analyze the logs – I know it can be scripted but you get the point it’s not the optimal way.

This post aims to show how this problem can be tackle using Logstash, Redis, ElasticSearch and Kibana to build a strong – highly scalable and customizable – log management system.


Here we will apply the following scheme from Logstash website. The only difference is that Kibana will be used instead of the embedded Logstash web interface.


What does what

  • Logstash: As a log shipper and a log indexer
  • Redis : As a broker – used as a queuing system
  • ElasticSearch : As a log indexer – store and index logs
  • Kibana : As a front-end viewer – a nice UI with useful extra features



Logstash comes as a jar, it is bundled with everything it needs to run.
The jar file is available here

To run it simply execute :
java -jar logstash-1.1.9-monolithic.jar agent -f CONFFILE -l LOGFILE

This will start a logstash instance that will act based on the CONFFILE it has been started with. To make this a bit cleaner, it is recommended to daemonize it so it can be started/stopped/started at boot time with traditional tools. Java-service-wrapper libraries will let you daemonize logstash in no time, traditional initd script works also.

Logstash needs to be installed on both the cluster node (shippers) and the central server where log will be gathered, stored and indexed (indexer)


Redis is a highly scalable key-value store, it will be used as a broker here. It will be installed on the central location.
Redis installation is pretty straightforward, it has packages available for every main linux distributions. (CentOS user will need to install EPEL first)

  1. Edit /etc/redis.conf, change bind to bind YOUR.IP.ADDR.ESS
  2. Make sure Redis is configured to start at boot (chkconfig/updated-rc.d)

Important : Make sure a specific set of firewall rules is set for Logstash shipper to be able to communicate with Redis

Elastic Search

Unfortunately ElasticSearch can not be found on package repositories on most Linux distributions yet. Debian users are a bit luckier, since the team at provide them with a .deb, for other distributions users, installation will need to be manual. ElasticSearch will be installed in the central location.

Get the source from, and as with Logstash I would recommend to use java-service-wrapper libraries to daemonize it.

You need to edit the configuration file elasticearch.yml and uncomment and replace it to The rest of the configuration needs to be edited based on the workload that is expected.


Kibana does not have packages yet, the source code needs to be retrieved from Github of Kibana website itself. Installation is straight forward

  1. wget
  2. tar xzf v0.2.0.tar.gz && cd Kibana-0.2.0
  3. gem install bundler
  4. bundle install
  5. vim KibanaConfig.rb
  6. bundle exec ruby kibana.rb

Main fields to configure in KibanaConfig.rb:

  • Elasticsearch : the URL of your ES server
  • KibanaPort : the PORT to reach Kibana
  • KibanaHost : The URL Kibana is bound to
  • Default_fields : The fields you’d like to see on your dashboard
  • (Extra) Smart_index_pattern : The index Kibana should look into

Kibana will be installed on the central location. Look into `sample/kibana` and `kibana-daemon.rb` for how to daemonize it.


Tomcat Servers

Here, Logstash will monitor two kinds of logs, the applicactions logs and the logs access

Logs Access

In order to enable logs access in tomcat edit your /usr/share/tomcat7/conf/server.xml and add the AccesLog valve

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
    prefix="localhost_access_log." suffix=".txt" renameOnRotate="true"
    pattern="%h %l %u %t &quot;%r&quot; %s %b" />

Application Logs

Here a library that will output log4j message directly to the logstash json_event format will be used – Special thanks to @lusis for the hard work – so no groking will be required

Configure log4j.xml

Edit the /usr/share/tomcat7/wepapps/myapp1/WEB-INF/classes/log4j.xml

<appender name="MYLOGFILE" class="org.apache.log4j.DailyRollingFileAppender">
    <param name="File" value="/path/to/my/log.log"/>
    <param name="Append" value="false"/>
    <param name="DatePattern" value="'.'yyyy-MM-dd"/>
    <layout class="net.logstash.log4j.JSONEventLayout"/>

Logstash file


An example of what a Logstash shipper config file could look like

	input {

  		file {
    		  path => '/path/to/my/log.log'
    		  format => 'json_event'
    		  type => 'log4j'
		  tags => 'myappX-nodeX'

  		file {
    		path => '/var/log/tomcat7/localhost_access_log..txt'
    		format => 'plain'
    		type => 'access-log'
			tags => 'nodeX'


	filter {

		grok {
			type => "access-log"
			pattern => "%{IP:client} \- \- \[%{DATA:datestamp}\] \"%{WORD:method} %{URIPATH:uri_path}%{URIPARAM:params} %{DATA:protocol}\" %{NUMBER:code} %{NUMBER:bytes}"

		kv {
			type => "access-log"
			fields => ["params"]
			field_split=> "&?"

            urldecode {
                    type => "access-log"
                    all_fields => true


	output {

		redis {
			host => "YOUR.IP.ADDR.ESS"
			data_type => "list"
			key => "logstash"



An example of what a Logstash indexer config file could look like

	input {

		redis {
			host => "YOUR.IP.ADDR.ESS"
			type => "redis-input"
			data_type => "list"
			key => "logstash"
			format => "json_event"


	output {

		elasticsearch {
			host => "YOUR.IP.ADDR.ESS"



  1. Make sure Redis + ElasticSearch + LogStash(indexer) + Kibana are started
  2. Make sure all LogStash (shipper) are started
  3. Go to YOUR.IP.ADDR.ESS:5601 and enjoy a nice structured workflow of logs

The Lucene query language can be used on the header text-box to query/filter results. Kibana will have a new interface soon, that will let one customize an actual dashboards of logs, take a peak at the demo it does look promising.

Find below some screenshot (current version of Kibana) of this what the configuration based on this post provide :

Log access analysis
Screen shot 2013-03-02 at 11.22.48
Application log analysis
Screen shot 2013-03-02 at 12.27.04
Application log analysis -details
Screen shot 2013-03-02 at 12.27.23

You can see that their tags are marked ‘pentaho-node1’ so now it is easy to know what application (pentaho) on which node (node1) did produce the error.

Kibana has some excellent features take time to get to know them.


Last month twitter example was not real-life problems oriented, but with this post one can see all the power behind the use of those tools. Once those tools set up correctly one can gather all the logs of a cluster and explore them, one can easily figure out from where issues are coming from and all at a fingertip. QED

java-service-wrapper or how to daemonize your java services for all major OSes

Java services (java programs in general) come in different flavors, either in a jar file either with a shell script that will execute the java program after parsing your options or any other form. Most often than not, java services do not come with a handy init.d script leaving you the responsibility of doing/maintaining it by your own. Writing init.d scripts isn’t the most attractive task one can be asked to do, it can get really cumbersome. Also it raises the question of ‘What if I want to run it on another OS ?’, this script will need to be adapted each and every time. In this blog post, Java-Service-Wrapper (JSW) will be introduced, JSW allows you to make your java services run as native Unix/Linux/Windows services in a painless way.

In this blog post in order to show how java-service-wrapper works, Logstash (A log management tool, see this post if you want more details) will be installed as a Linux service.

Getting Logstash and running it manually

First thing first, let’s download the logstash jar here. It contains all the dependencies Logstash needs to run.

Then create a the configuration file (ie. /etc/logstash/test.conf) Logstash will be run against :

input {
    file {
        path => '/var/log/secure'
        type => 'secure'
output {
    file {
        path => '/tmp/test.json'

Finally simply run : java -jar logstash-1.1.9-monolithic.jar agent -f /etc/logstash/test.conf
After few seconds (maybe a minute or two), if you ssh to the box where Logstash is running, you should see the content of the /var/log/secure log line in your /tmp/test.json file

We have a working version of Logstash, but you can already see, that starting it and stoping it without a service fashion way will be a pain.

In order to make things the clean way we will move and rename logstash-1.1.9-monolithic.jar to /usr/local/bin/logstash.jar

Getting java-service-wrapper

For this step simply download the appropriate tar ball from here

Once untared you’ll be presented with some files and directories, out of all of them only 5 will be useful here

  • lib/ copy it to /usr/local/lib
  • lib/wrapper.jar: copy it to /usr/local/lib
  • bin/wrapper: copy it to /usr/local/bin
  • src/bin/ copy it and rename it to /usr/local/bin/myservice_wrapper and add the executable bit permission if this is not already the case (ie. /usr/local/bin/logstash_wrapper)
  • conf/wrapper.conf: copy it and rename it to /etc/myservice.conf (ie. /etc/logstash.conf)

And that is it for the installation of java-service-wrapper, from now on if you need to add other java services all you’ll need to do is copy again the and wrapper.conf with the accurate names.

Configuring and wrapper.conf accordingly aka myservice_wrapper

The change in this file are pretty straight forward, it concerns the details about the way your service will be run.

Those are the minimum changed one needs to edit :

  • APP_NAME: your service name (ie. logstash)
  • APP_LONG_NAME: if you have a longer description
  • WRAPPER_CONF: /etc/myservice.conf (ie. /etc/logstash.conf)

Later more in depth change can be brought :

  • PRIORITY: specify nice value
  • PIDDIR: the path of where the pid file should be stored
  • RUN_AS_USER: specify the user the service should be run as
  • USE_UPSTART: flag for using upstart

Also you can define the run levels when your service should be started/stopped in this specific file

wrapper.conf aka myservice.conf

Here, the configuration is a bit more complex. This file defines the way your java program will be called (classpath, libraries, parameters, etc…). We will see some aspect of the configuration here, but for a full details of what is possible please refer to the official doc

Simply specify the java executable path

Specify the class to execute when the Wrapper starts the application. There are 4 possibilities, refer to the doc for more informations.

Basically, if your application comes within a jar use WrapperJarApp if it comes in a different way use WrapperSimpleApp. Since Logstash comes in a jar we will be using WrapperJarApp class


Log file to which all output to the console will be logged

Java library path to use

Java classpath to use

Additional Java parameters to pass to Java when it is launched. These are not parameters for your application, but rather parameters for the JVM.

And finally the parameter we want to pass to our service.
We’ve seen previously that we ran logstash the following way :java -jar /usr/local/bin/logstash.jar agent -f /tmp/test.conf

This will be translated with the following configuration

And we are done !


First step of testing will be to verify the wrapper shell script does work correctly.
Run : /usr/local/bin/logstash_wrapper console

If that works you should see something similar to this

wrapper  | --> Wrapper Started as Console
wrapper  | Java Service Wrapper Community Edition 32-bit 3.5.17
wrapper  |   Copyright (C) 1999-2012 Tanuki Software, Ltd. All Rights Reserved.
wrapper  |
wrapper  | 
wrapper  | Launching a JVM...
jvm 1    | WrapperManager: Initializing...
jvm 1    | {:message=>"Read config", :level=>:info, :file=>"/home/vagrant/logstash-1.1.9-monolithic.jar!/logstash/agent.rb", :line=>"329", :method=>"run"}
jvm 1    | {:message=>"Start thread", :level=>:info, :file=>"/home/vagrant/logstash-1.1.9-monolithic.jar!/logstash/agent.rb", :line=>"332", :method=>"run"}

Finally create a symbolic link /etc/init.d/logstash pointing to your /usr/local/bin/logstash_wrapper, and you are done.

service logstash start
service logstash status
service logstash stop

All three command should be available.

Now if you change OSes all you have to do is edit the file path in the wrapper.conf (ie. /etc/logstash.conf) to reflect your actual paths on the new OS and nothing else. You will be up and running in no time.


Java-service-wrapper is one out of several options to do this specific task. I am not claiming that it is the solution that will solve all your problem, but it is a strong option, it does – in an understandable way – solve the java-service daemonization issue and the multiple OS porting. JSW saves you time and effort on the long run. Now write the configuration file and daemonzie it everywhere. QED

Powerful Analysis Tool using Logstash + ElasticSearch + Kibana

Reading about Logstash the first time I thought Yet Another Log Management Tool, but I was totally wrong.

As its author claims a log is nothing more than :

date + content = LOG

So sure all our system logs look that way (apache, nginx, mail, mysql, auth, etc…) but not only … What about git commit, tweets, facebook status, Nike+ run, a purchase, etc… ?

  • Git : A git commit includes a timestamp with a message + commit details
  • Tweet : A tweet is a message posted at a specific point-in-time
  • Facebook Status :A facebook status is a message posted at a specific point-in-time
  • Nike+ run : A run ends up at a specific point-in-time and convey extra data (Distance, Length, GPS tracks)
  • A purchase : A purchase is made at a specific point-in-time and convey extra data (Total amount, quantity of product bought, etc..

So more than a simple log management tool, Logstash with the help of Kibana and ElasticSearch can form a really powerful and fast analysis tool.

Installation & Demo (~10 minutes)


  • Java 1.6+ needs to be installed
  • The bundle gem needs to be installed

Download logstash


Create a logstash-twitter.conf

input {
    twitter {
        type           => "twitter"
        user           => "username"
        password       => "password"
        message_format => "json"
        keywords       => ["kibana", "logstash", "elasticsearch"]

output {
    elasticsearch {
        embedded       => true

Run logstash

java -jar logstash-1.1.9-monolithic.jar agent -f logstash-twitter.conf

Download & Install Kibana

curl -L | tar -xzvf -
cd Kibana-0.2.0
bundle install
ruby Kibana.rb

Now access it via http://localhost:5601/

Screenshot (Since today was CouchDB conference in Berlin, I supposed I would have had more input tracking couchdb keyword)

Events list

Kibana example

Kibana example

Event detail

Kibana event detail

Kibana event detail

And done, everytime someone tweet about either Kibana, Logstash or ElasticSearch you will have all the information about this tweet in a nice UI

Technical Explanations

Logstash –

Logstash works as a pipeline system : inputs | filters | outputs. In our simple example we used only the inputs | outputs pipe. Logstash by default offers 26 different inputs and 45 differents outputs ( – at the really bottom). Here the twitter input relies on the Twitter Streaming API to retrieve the tweets with the keywords we mentionned, then send them directly to our ElasticSearch instance to store them and index them.

ElasticSearch –

ElasticSearch is a distributed, RESTful, search server based on Apache Lucene. It does fully support the near real-time search of Apache Lucene. Its role here is to index and store all the event that it is feeded with. The server supports the Lucene Query Language.

Note : here the embedded ElasticSearch component is used. I would advise one to set one on its own in an independent manner.

Kibana –

Kibana is the UI that sits on top of ElasticSearch. It will give you the interface to explore your data, select them, drill into them, filter them, group them etc… Even though it is pretty basic, it lets you get the most out of your data.


Even if this example is really basic – and mainly useless – it shows you how quickly a powerful analysis tool can be set up using this triplet. If the input you are looking for does not exist yet, simply create it ( same thing apply for the output.
Do not forget that everything with a date (specific point-in-time) and a content is in some way a log. Now you know how to analyze and measure it. QED

Gitlab + Custom Hooks

With Gitlab (also with Github) it is straight forward to add post-receive web-hooks so actions can be taken after a push event. At the difference of Github, Gitlab is normally self-hosted, which could technically lead to interesting possibilities with custom post-receive (or any other) hooks. Unfortunately it is not possible to add custom-hooks directly from the web interface, it needs to be done under the hood.

Gitlab relies on Gitolite for it’s authorization process, we will make it relies on Gitolite also for git hooks’ management. We will stick to the Gitolite way of decuplating hooks based on the doc, in the section hook chaining

How to make Gitlab custom hooks aware ?

Remember during Gitlab’s installation the following step – copying Gitlab custom post-receive hook to Gitolite hooks directory :

cp ./lib/hooks/post-receive /home/git/.gitolite/hooks/common/post-receive

In order to make Gitlab custom post-receive hook aware, you need to edit the /home/git/.gitolite/hooks/common/post-receive file so it looks like this :

#!/usr/bin/env bash

# This file was placed here by GitLab. It makes sure that your pushed commits
# will be processed properly.

while read oldrev newrev ref
  reponame=`basename "$pwd" | sed s/\.git$//`
  env -i redis-cli rpush "resque:gitlab:queue:post_receive" "{\"class\":\"PostReceive\",\"args\":[\"$reponame\",\"$oldrev\",\"$newrev\",\"$ref\",\"$GL_USER\"]}" > /dev/null 2>&1
  if [ -x "$path_to_hook/$reponame" ];then
    "$path_to_hook/$reponame" "$reponame" "$oldrev" "$newrev" "$ref" "$GL_USER"

How does it work ?

Explanation of the file difference with previous version

  • Line 8 : indicates the directory where the post-receive hooks will be stored
  • Line 12-14 : if a post-receive hook exists for this project execute it

In practice

In practice there will be one post-receive hook per project and the hook should be named after the project. (Do not forget to make it executable)

And that’s about it, from now on every time you will be pushing your project in Gitlab, it will execute the post-receive script located in $path_to_hook and named after the project itself.


Project name : customhooks
Post-Receive Hook: location is /home/git/.gitolite/hooks/common/post-receive.secondary.d/customhooks


GIT_WORK_TREE=/var/www/blog git checkout -f

Note : the post-receive scripts can be written in any script-able language be it Shell, Ruby, Python, Perl, etc…

After pushing the customhooks project, I will have a copy of my actual project in the directory /var/www/blog . It’s up to you now to have hooks as sophisticated as your needs requires it.


This post shows how to do it specifically for the post-receive hook, but the same logic can be applied to the other available hooks. Remember, Gitolite manages them not Gitlab directly.
Even if genuinely Gitlab does not give you the possibility to add custom hooks, it is an easy feature to add. QED

Gitolite + OpenLDAP

While for small project one can easily manage Gitolite authorizaton permissions manually, this task can get really cumbersome as the project grows and different roles get to have different permissions (ie. devel, qa, etc…)

Companies traditionally rely on a centralized system to handle their users, the groups they belong to and as many information as they actually need (or not), one of them being LDAP. The purpose of this post is to see how to make Gitolite rely on informations stored in an LDAP DIT to grant user to perform specific actions on the git repositories.

Prequisite : In order to follow this post you will need to have a working Gitolite installation (v3.0+) and a reachable LDAP directory.

This is the LDIF file that will be used to handle authentication :

dn: cn=john,ou=group,dc=yanisguenane,dc=fr
cn: john
gidNumber: 20001
objectClass: top
objectClass: posixGroup
memberUid: john

dn: cn=jane,ou=group,dc=yanisguenane,dc=fr
cn: jane
gidNumber: 20002
objectClass: top
objectClass: posixGroup
memberUid: jane

dn: cn=devel,ou=group,dc=yanisguenane,dc=fr
cn: devel
gidNumber: 20003
objectClass: top
objectClass: posixGroup
memberUid: john

dn: uid=jane,ou=people,dc=yanisguenane,dc=fr
uid: jane
uidNumber: 10000
gidNumber: 10000
cn: jane
sn: jane
objectClass: top
objectClass: person
objectClass: posixAccount
objectClass: shadowAccount
loginShell: /bin/bash
homeDirectory: /home/jane

dn: uid=john,ou=people,dc=yanisguenane,dc=fr
uid: john
uidNumber: 10001
gidNumber: 10001
cn: john
sn: john
objectClass: top
objectClass: person
objectClass: posixAccount
objectClass: shadowAccount
loginShell: /bin/bash
homeDirectory: /home/john

Make Gitolite LDAP aware

Thought by default Gitolite is LDAP (and any authentication system) unaware, author left an open door for Gitolite to query a specific authentication system one wants. Be it LDAP or any other queriable system.

They are three rules to make that happen :

  • The query to the authentication system should be done via a script
  • The script should take the username as only parameter
  • The script should return a group space separated list the defined user belongs to

An example of an LDAP script can be find here
Note : It should be edited to meet your LDAP DIT configuration, the link posted matches the LDIF used for this post

In order to make Gitolite LDAP aware one needs to edit the file located at $GITOLITE_HOME/.gitolite.rc. Inside the %RC hash, add the following line :

In v3

GROUPLIST_PGM           =>  '/path/to/ldap-query-groups-script',

In v2

$GL_GET_MEMBERSHIPS_PGM => '/path/to/ldap-query-groups-script',

And … done ! Your Gitolite installation is LDAP aware !

How to use it

  • Add the authorized users to Gitolite keychain
  • As you would do with a regular Gitolite setup, you need to add the user to the Gitolie keychain. The name of the public key file (.pub) should match your LDAP username you want to set up.

    Here, they are two ways to deal with it

    • Full LDAP : get the SSH key from querying your LDAP DIT – if they are stored in here for each user
    • Basic : copy the user public key file via your prefered way

  • Define the repositories and permissions
  • Important : Remember that for a given username, the script will return a list of groups the user belongs to. Hence, your repositories configuration should be group based and not user based. A good practice would be that each user has its individual group, so you can grant access to individual user.

    repo test-ldap-devel
        RW+    =    @devel
    repo test-ldap-jane
        Rw+    =    @john @jane
  • Finally push the chances
  • Once configured to your needs simply push the changes.


Session 1 – john

john@workstation-john: ssh-keygen -t rsa -b 1024 -N '' -f ~/.ssh/john
john@workstation-john: scp ~/.ssh/ git add && git commit -m "" && git push origin master
john@workstation-john: git clone
Cloning into test-ldap-devel...
warning: You appear to have cloned an empty repository.

Session 2 – jane

jane@workstation-jane: ssh-keygen -t rsa -b 1024 -N '' -f ~/.ssh/jane
jane@workstation-jane: scp ~/.ssh/ git add && git commit -m "" && git push origin master
jane@workstation-jane: git clone
Cloning into test-ldap-devel...
FATAL: R any test-ldap-devel jane DENIED by fallthru
(or you mis-spelled the reponame)
fatal: The remote end hung up unexpectedly

jane@workstation-jane: git clone
Cloning into test-ldap-jane...
warning: You appear to have cloned an empty repository.


As we can see on Jane’s session, her try to clone test-ldap-devel was denied, but the one to clone test-ldap-jane did work. QED