SEO Dashboard : Tutorial #PaasLogs for Log File Analyser – Part 3

The next step is to configure PaasLogs. PaasLogs (Platform as a service) is a cloud computing model that delivers applications over the Internet dedicated to logs.

This tutorial will help you to understand the core concepts behind the PaaS Logs and how to send your webserver logs to the engine.

Now, you have your crawled urls classified by number of inlinks, number of outlinks, section type, compliant type, active type, wordcount, depth, ga sessions, … ( More information can be found here ). We just have to send SEO data to Paaslogs.

For your information, this is the architecture I presented at the SEO event ( @QueduWeb ) to create real time log analyser. You can download my slides

paaslogs architecture

Step 1 : Create your account
The first step is to create an account on http://www.runabove.com
If you have already setup a lab, just log into runabove.com

create account

Step2 : Fill in the form

fill in the form

sms

Step 3 : Activate PaasLogs

Activate the Labs by clicking on more Labs… and then click on the PaaS Logs button.

Once Activated a new entry will appear in the navigation sidebar, click on the PaaS Logs button there to jump to the main interface.

The main interface of PaaS Logs will then appear.

enable paaslogs

On this page, only one action is available: Create user.
You will then get a username in the following form: ra-logs-XXXXX and the associated password.


NOTE THEM CAREFULLY, you will have to use theses credentials with Kibana.

Step 4 : Discover Paas Logs interface

paaslogs account creation

On this interface you can see 5 items:

  • The Streams are the recipient of your logs.
  • The Dashboard is the global view of your logs.
  • The Inputs allow you to ask OVH to host your own dedicated collector like Logstash.
  • The Aliases allow you to access directly your data from your Kibana.
  • The Roles give you control over who can read and access your streams or dashboard.

Streams

  1. On streams zone, click on blue + button
  2. Choose a name, define a description
  3. Save the entry

You have created your first stream. By expanding your stream information, you will see your X-OVH-TOKEN.
This key is the only one you will need to address your stream. Under this token, you will have a direct link to your stream in Graylog.



Dashboards

  1. On dashboards zone, click on blue + button
  2. Choose a name, define a description
  3. Save the entry

You have created your first dashboard.


Aliases
To access your logs from Kibana, you will need to setup an Elasticsearch Alias and link them to your Graylog streams, so here we go again :

  1. On alias zone, click on blue + button
  2. Choose a name, define a description
  3. Save the entry
  4. Once status marked as OK (refresh page if necessary) click on Associate link
  5. Define there the kibana streams you want to associate to your alias



When your alias is created, you have just to associate it to your stream.

Kibana requires an index where to store your dashboards and other settings. To create it in our Elasticsearch cluster:

Click on link Enable Kibana indice

Wait a few seconds. Now, your index is ready


Inputs
To parse your logs, you will need to setup logstash and configure some plugins :

  1. On inputs zone, click on blue + button
  2. Choose a name, define a description
  3. Save the entry



Click on Subscription link, choose your stream previously created and click on “Attach this stream” button



Now, you can configure your logstash by opening the port 5044


Finally, you need to configure Logstash with :
– Input section
– Filter section
– Grok

On input section, add:

input {
  beats {
      port => 5044
          ssl => true
              ssl_certificate => "/etc/ssl/private/server.crt"
              ssl_key => "/etc/ssl/private/server.key"
   }
}


On filter section, add :
Now, you can use these plugins : Grok, Csv, Elasticsearch, Dns
Important:  my login and password are specified for the elasticsearch plugin but in the next version of Paaslogs, this lines won’t be useful.
The DNS filter performs a lookup on records specified under the reverse arrays. It is very useful to detect googlebot or msnbot.

filter {

    mutate {
        rename => {
             "source" => "filename"
        }
    }
      
    if [type] == "apache" {  
    
      mutate {
          add_field => { 
                 "section" => "nohtml"
                 "active" => "FALSE"
           }
        }

       grok {
           match => { "message" => "%{OVHCOMMONAPACHELOG}" }
           patterns_dir => "/opt/logstash/patterns"
       }

       if ("_grokparsefailure" in [tags]) {
           mutate {
              remove_tag => [ "_grokparsefailure" ]
            }
           grok {
              match => [ "message", "%{OVHCOMBINEDAPACHELOG}" ]
              patterns_dir => "/opt/logstash/patterns"
             }


        }

        elasticsearch { 
          hosts => "laas.runabove.com" 
          index => "logsDataSEO" 
          user => "ra-logs-XXX" 
          password => "2OkHXXXXXXX"        
          ssl => true 
          query => 'type:csv AND request:"%{[request]}"'
          fields => [["section","section"],["active","active"],["speed","speed"],["compliant","compliant"],["depth","depth"],["inlinks","inlinks"],["outlinks","outlinks"],["status_title","status_title"],["status_description","status_description"],["status_h1","status_h1"],["group_inlinks","group_inlinks"],["group_wordcount","group_wordcount"]]
        }

       dns {
          action => "replace"
          reverse => [ "clientip" ]
       }

       if [clientip] =~ /googlebot.com/ { 
          mutate {
             add_field => { "bot" => "google" }
         }
       }

      if [clientip] =~ /search.msn.com/ { 
         mutate {
            add_field => { "bot" => "bing" }
         }
      }
 
    }
    
    if [type] == "csv" {
        csv {
            columns => ["request", "section","active", "speed", "compliant","depth","inlinks","outlinks","status_title","status_description","status_h1","group_inlinks","group_wordcount"]
            separator => ";"
        } 
    }
    
}

In Custom Grok Patterns, add

OVHCOMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion_num:float})?|%{DATA:rawrequest})" %{NUMBER:response_int:int} (?:%{NUMBER:bytes_int:int}|-)
OVHCOMBINEDAPACHELOG %{OVHCOMMONAPACHELOG} "%{NOTSPACE:referrer}" %{QS:agent}

Click on the “Update Configuration” button and the configuration of Paaslogs is finished.
If you are fast, you can do it in ten minutes.

Click on the Start Button to run Logstash

The server has restarted successfully !

start paaslogs

Step 5 : Using Filebeat to send logs to Paaslogs
Filebeat is an open source file harvester, mostly used to fetch logs files and feed them into logstash.

  • Install Filebeat : https://www.elastic.co/downloads/beats/filebeat
    curl -L -O https://download.elastic.co/beats/filebeat/filebeat_1.2.1_amd64.deb
    sudo dpkg -i filebeat_1.2.1_amd64.deb
    
  • Edit your filebeat.yml
    The Debian installation package will install the config file in the following directory: /etc/filebeat/filebeat.yml
    You can find your config file with this commands :

     

    $ whereis filebeat.yml
    $ nano /etc/filebeat/filebeat.yml
  • Change these 4 lines :
    • Log file path : /home/ubuntu/lib/apache2/log/access.log
    • CSV files path: /home/ubuntu/workspace/csv/crawled-urls-filebeat-*.csv
    • Host address : c002-5717e1b5d2ee5e00095cea38.in.laas.runabove.com:5044
    • OVH certificate path : /home/ubuntu/workspace/certificat/laas-ca.crt
      filebeat:
        prospectors:
          -
            paths:
              - /home/ubuntu/lib/apache2/log/access.log
            input_type: log
            fields_under_root: true
            document_type: apache
          
          -
            paths:
              - /home/ubuntu/workspace/csv/crawled-urls-filebeat-*.csv
            input_type: csv
            fields_under_root: true
            document_type: csv
      
      output:
        logstash:
          hosts: ["c002-5717e1b5d2ee5e00095cea38.in.laas.runabove.com:5044"]
          worker: 1   
          tls:
            certificate_authorities: ["/home/ubuntu/workspace/certificat/laas-ca.crt"]
      
  • Copy/Paste laas-ca.crt in the path you have previously chosen.
    Here :/home/ubuntu/workspace/certificat/laas-ca.crtYou can also download it directly from this link (Right-Click then Save As): SSL CA cert.
  • Start filebeat
    sudo /etc/init.d/filebeat start
    sudo service filebeat start > /dev/null 2>&1 &
    

Summary
Now, you have configured your Paaslogs and Filebeat, and you just need to copy/paste your log files or csv files in the right directory.

In my example :
– /home/ubuntu/lib/apache2/log/
– /home/ubuntu/workspace/csv/

If you use Windows, you can use :
– c:/Filebeat-Paaslogs/logs
– c:/Filebeat-Paaslogs/csv

You can test your first dashboard with Graylog by clicking on the “dashboard” link and check that your logs have been sent correctly to Paaslogs.

In my next article, I will offer you :
– Kibana dashboards
– Kibana visualisations

More information can be found here : https://community.runabove.com/kb/en/logs/