Logstash: Convert String to JSON with 2 Common Cases

Minh Vu

By Minh Vu

Updated Nov 21, 2023

Figure: Logstash: Convert String to JSON with 2 Common Cases

Disclaimer: All content on this website is derived directly from my own expertise and experiences. No AI-generated text or automated content creation tools are used.

In this tutorial, I will show you how to parse a string to JSON object in Logstash.

In short:

  • To parse a multiline JSON data, use the multiline codec followed by the json filter.
  • To parse a JSON string in a field, use the json filter with the source and target option.

Contents

Inputting Multiline JSON Data in Logstash

Suppose you have the following log file:

test.log
{"id":1,"title":"iPhone 9","description":"An apple mobile which is nothing like apple","price":549,"discountPercentage":12.96,"rating":4.69,"stock":94,"brand":"Apple","category":"smartphones","thumbnail":"/data/1/thumbnail.jpg","images":["/data/1/1.jpg","/data/1/2.jpg","/data/1/3.jpg","/data/1/4.jpg","/data/1/thumbnail.jpg"]}
{"id":2,"title":"iPhone X","description":"SIM-Free, Model A19211 6.5-inch Super Retina HD display with OLED technology A12 Bionic chip with ...","price":899,"discountPercentage":17.94,"rating":4.44,"stock":34,"brand":"Apple","category":"smartphones","thumbnail":"/data/2/thumbnail.jpg","images":["/data/2/1.jpg","/data/2/2.jpg","/data/2/3.jpg","/data/2/thumbnail.jpg"]}
{"id":3,"title":"Samsung Universe 9","description":"Samsung's new variant which goes beyond Galaxy to the Universe","price":1249,"discountPercentage":15.46,"rating":4.09,"stock":36,"brand":"Samsung","category":"smartphones","thumbnail":"/data/3/thumbnail.jpg","images":["/data/3/1.jpg"]}
{"id":4,"title":"OPPOF19","description":"OPPO F19 is officially announced on April 2021.","price":280,"discountPercentage":17.91,"rating":4.3,"stock":123,"brand":"OPPO","category":"smartphones","thumbnail":"/data/4/thumbnail.jpg","images":["/data/4/1.jpg","/data/4/2.jpg","/data/4/3.jpg","/data/4/4.jpg","/data/4/thumbnail.jpg"]}
{"id":5,"title":"Huawei P30","description":"Huawei’s re-badged P30 Pro New Edition was officially unveiled yesterday in Germany and now the device has made its way to the UK.","price":499,"discountPercentage":10.58,"rating":4.09,"stock":32,"brand":"Huawei","category":"smartphones","thumbnail":"/data/5/thumbnail.jpg","images":["/data/5/1.jpg","/data/5/2.jpg","/data/5/3.jpg"]}
{"id":6,"title":"MacBook Pro","description":"MacBook Pro 2021 with mini-LED display may launch between September, November","price":1749,"discountPercentage":11.02,"rating":4.57,"stock":83,"brand":"Apple","category":"laptops","thumbnail":"/data/6/thumbnail.png","images":["/data/6/1.png","/data/6/2.jpg","/data/6/3.png","/data/6/4.jpg"]}
 

Remember to add a new line at the end of the file.

To convert the raw JSON string lines into JSON objects. Here are the steps:

  1. Input the log file using multiline codec.
  2. Parse the message field as JSON using the json filter.
logstash.conf
input {
  file {
    path => "/home/dminhvu/elastic/test.log"
    start_position => "beginning" # read from the beginning of the file
    sincedb_path => "/dev/null" # keep Logstash rereading the file when restarting
    codec => multiline { # multiline codec to read multiple lines as one event
      pattern => "\n" # each line is separated by a new line
      what => "next" # read the next part after the new line
    }
  }
}
 
filter {
  json {
    source => "message" # parse the message field as JSON
  }
}
 
output {
  file {
    path => "/home/dminhvu/elastic/output.log"
    codec => "json_lines" # write to JSON lines format
  }
}

I have explained line by line in the config file. To help you understand more what to do after the input section, I will focus on the filter section.

Here, we use the json filter to parse the message field as JSON. The message field is created when we input data into Logstash.

The output section will produce the same multiline JSON data with some additional fields, you can use the jq command to format the output:

console
jq -S . output.log > output_formatted.log

So after using the json filter, all events will be parsed and each event should look like this:

output_formatted.log
{
  "@timestamp": "2023-11-21T04:16:10.943722335Z",
  "@version": "1",
  "brand": "Apple",
  "category": "smartphones",
  "description": "An apple mobile which is nothing like apple",
  "discountPercentage": 12.96,
  "event": {
    "original": "{\\"id\\":1,\\"title\\":\\"iPhone 9\\",\\"description\\":\\"An apple mobile which is nothing like apple\\",\\"price\\":549,\\"discountPercentage\\":12.96,\\"rating\\":4.69,\\"stock\\":94,\\"brand\\":\\"Apple\\",\\"category\\":\\"smartphones\\",\\"thumbnail\\":\\"https:\/\/dminhvu.com/data/1/thumbnail.jpg\\",\\"images\\":[\\"https:\/\/dminhvu.com/data/1/1.jpg\\",\\"https:\/\/dminhvu.com/data/1/2.jpg\\",\\"https:\/\/dminhvu.com/data/1/3.jpg\\",\\"https:\/\/dminhvu.com/data/1/4.jpg\\",\\"https:\/\/dminhvu.com/data/1/thumbnail.jpg\\"]}"
  },
  "host": {
    "name": "dminhvu"
  },
  "id": 1,
  "images": [
    "/data/1/1.jpg",
    "/data/1/2.jpg",
    "/data/1/3.jpg",
    "/data/1/4.jpg",
    "/data/1/thumbnail.jpg"
  ],
  "log": {
    "file": {
      "path": "/home/dminhvu/elastic/test.log"
    }
  },
  "message": "{\\"id\\":1,\\"title\\":\\"iPhone 9\\",\\"description\\":\\"An apple mobile which is nothing like apple\\",\\"price\\":549,\\"discountPercentage\\":12.96,\\"rating\\":4.69,\\"stock\\":94,\\"brand\\":\\"Apple\\",\\"category\\":\\"smartphones\\",\\"thumbnail\\":\\"https:\/\/dminhvu.com/data/1/thumbnail.jpg\\",\\"images\\":[\\"https:\/\/dminhvu.com/data/1/1.jpg\\",\\"https:\/\/dminhvu.com/data/1/2.jpg\\",\\"https:\/\/dminhvu.com/data/1/3.jpg\\",\\"https:\/\/dminhvu.com/data/1/4.jpg\\",\\"https:\/\/dminhvu.com/data/1/thumbnail.jpg\\"]}",
  "price": 549,
  "rating": 4.69,
  "stock": 94,
  "thumbnail": "/data/1/thumbnail.jpg",
  "title": "iPhone 9"
}

Now you can access the fields in the JSON object as usual with the syntax: %{field_name}, or %{[field_name][sub_field_name]}. To access array fields, you can use %{[field_name][index]}.

For example, I will add a new field using the mutate filter based on some conditonal statements:

logstash.conf
# ...
filter {
  json {
    source => "message"
  }
 
  mutate {
    add_field => {
      "dummy_description" => "%{[title]} is of brand %{[brand]}"
      "first_image" => "%{[images][0]}"
    }
  }
 
  if [discountPercentage] > 0 {
    mutate {
      add_field => {
        "dummy_discount" => "This product has %{[discountPercentage]}% discount"
      }
    }
  }
}
# ...

Converting JSON String to JSON Object from a Field

Suppose you have another log file not fully in JSON format but only a part of it is JSON, like this one:

test-2.log
POST /some/endpoint body={"id":1,"name":"Minh Vu","age":21} headers={"Content-Type":"application/json"}
POST /some/endpoint body={"id":2,"name":"WiseCode","age":22} headers={"Content-Type":"application/json"}
POST /some/endpoint body={"id":3,"name":"Nhi Pham","age":23} headers={"Content-Type":"application/json"}
 

Now you want to parse the body and headers field as JSON object. You can keep the same input and output filter like above, but you need to change the filter section:

  1. Use the grok filter to get necessary fields.
  2. Then use the json filter with specified source and target (can be the same) to convert the string to JSON object.
logstash.conf
input {
  file {
    path => "/home/dminhvu/elastic/test-2.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "\n"
      what => "next"
    }
  }
}
 
filter {
  grok { # parse the message field with grok
    match => {
      "message" => "POST %{URIPATHPARAM:request} body=%{GREEDYDATA:body} headers=%{GREEDYDATA:headers}"
    }
  }
  json { # convert body to JSON object
    source => "body"
    target => "body"
  }
  json { # convert headers to JSON object
    source => "headers"
    target => "headers"
  }
}
 
output {
  file {
    path => "/home/dminhvu/elastic/output-2.log"
    codec => "json_lines"
  }
}

After we get the output-2.log, you can use the jq command to format the output:

console
jq -S . output-2.log > output-2_formatted.log

One event of the output should look like this:

output-2_formatted.log
{
  "@timestamp": "2023-11-21T05:16:12.021382978Z",
  "@version": "1",
  "body": {
    "age": 21,
    "id": 1,
    "name": "Minh Vu"
  },
  "event": {
    "original": "POST /some/endpoint body={\\"id\\":1,\"name\\":\"Minh Vu\",\"age\\":21} headers={\\"Content-Type\\":\"application/json\"}"
  },
  "headers": {
    "Content-Type": "application/json"
  },
  "host": {
    "name": "dminhvu"
  },
  "log": {
    "file": {
      "path": "/home/dminhvu/elastic/test-2.log"
    }
  },
  "message": "POST /some/endpoint body={\\"id\\":1,\"name\\":\"Minh Vu\",\"age\\":21} headers={\\"Content-Type\\":\"application/json\"}",
  "request": "/some/endpoint"
}

As you can see, the body and headers fields are now JSON objects.

Conclusion

In this tutorial, I showd you how to parse a string to JSON object in Logstash in 2 common cases:

  1. Inputting multiline JSON data: use the multiline codec followed by the json filter.
  2. Converting JSON string in a field to JSON object: use the json filter with the source and target option.

If you have any questions, please leave a comment below.

Minh Vu

Minh Vu

Software Engineer

Hi guys 👋, I'm a developer specializing in Elastic Stack and Next.js. My blog shares practical tutorials and insights based on 3+ years of hands-on experience. Open to freelance opportunities — let's get in touch!

Comments

Be the first to comment!

Leave a Comment

Receive Latest Updates 📬

Get every new post, special offers, and more via email. No fee required.