In this tutorial, I will show you how to parse a string to JSON object in Logstash.
In short:
To parse a multiline JSON data, use the multiline
codec followed by the json
filter.
To parse a JSON string in a field, use the json
filter with the source
and target
option.
Suppose you have the following log file:
test.log { " id " : 1 , " title " : " iPhone 9 " , " description " : " An apple mobile which is nothing like apple " , " price " : 549 , " discountPercentage " : 12.96 , " rating " : 4.69 , " stock " : 94 , " brand " : " Apple " , " category " : " smartphones " , " thumbnail " : " /data/1/thumbnail.jpg " , " images " :[ " /data/1/1.jpg " , " /data/1/2.jpg " , " /data/1/3.jpg " , " /data/1/4.jpg " , " /data/1/thumbnail.jpg " ]}
{ " id " : 2 , " title " : " iPhone X " , " description " : " SIM-Free, Model A19211 6.5-inch Super Retina HD display with OLED technology A12 Bionic chip with ... " , " price " : 899 , " discountPercentage " : 17.94 , " rating " : 4.44 , " stock " : 34 , " brand " : " Apple " , " category " : " smartphones " , " thumbnail " : " /data/2/thumbnail.jpg " , " images " :[ " /data/2/1.jpg " , " /data/2/2.jpg " , " /data/2/3.jpg " , " /data/2/thumbnail.jpg " ]}
{ " id " : 3 , " title " : " Samsung Universe 9 " , " description " : " Samsung's new variant which goes beyond Galaxy to the Universe " , " price " : 1249 , " discountPercentage " : 15.46 , " rating " : 4.09 , " stock " : 36 , " brand " : " Samsung " , " category " : " smartphones " , " thumbnail " : " /data/3/thumbnail.jpg " , " images " :[ " /data/3/1.jpg " ]}
{ " id " : 4 , " title " : " OPPOF19 " , " description " : " OPPO F19 is officially announced on April 2021. " , " price " : 280 , " discountPercentage " : 17.91 , " rating " : 4.3 , " stock " : 123 , " brand " : " OPPO " , " category " : " smartphones " , " thumbnail " : " /data/4/thumbnail.jpg " , " images " :[ " /data/4/1.jpg " , " /data/4/2.jpg " , " /data/4/3.jpg " , " /data/4/4.jpg " , " /data/4/thumbnail.jpg " ]}
{ " id " : 5 , " title " : " Huawei P30 " , " description " : " Huawei’s re-badged P30 Pro New Edition was officially unveiled yesterday in Germany and now the device has made its way to the UK. " , " price " : 499 , " discountPercentage " : 10.58 , " rating " : 4.09 , " stock " : 32 , " brand " : " Huawei " , " category " : " smartphones " , " thumbnail " : " /data/5/thumbnail.jpg " , " images " :[ " /data/5/1.jpg " , " /data/5/2.jpg " , " /data/5/3.jpg " ]}
{ " id " : 6 , " title " : " MacBook Pro " , " description " : " MacBook Pro 2021 with mini-LED display may launch between September, November " , " price " : 1749 , " discountPercentage " : 11.02 , " rating " : 4.57 , " stock " : 83 , " brand " : " Apple " , " category " : " laptops " , " thumbnail " : " /data/6/thumbnail.png " , " images " :[ " /data/6/1.png " , " /data/6/2.jpg " , " /data/6/3.png " , " /data/6/4.jpg " ]}
Remember to add a new line at the end of the file.
To convert the raw JSON string lines into JSON objects. Here are the steps:
Input the log file using multiline
codec.
Parse the message
field as JSON using the json
filter.
logstash.conf input {
file {
path => " /home/dminhvu/elastic/test.log "
start_position => " beginning " # read from the beginning of the file
sincedb_path => " /dev/null " # keep Logstash rereading the file when restarting
codec => multiline { # multiline codec to read multiple lines as one event
pattern => "\n" # each line is separated by a new line
what => " next " # read the next part after the new line
}
}
}
filter {
json {
source => " message " # parse the message field as JSON
}
}
output {
file {
path => " /home/dminhvu/elastic/output.log "
codec => " json_lines " # write to JSON lines format
}
}
I have explained line by line in the config file. To help you understand more what to do after the input
section, I will focus on the filter
section.
Here, we use the json
filter to parse the message
field as JSON. The message
field is created when we input data into Logstash.
The output
section will produce the same multiline JSON data with some additional fields, you can use the jq
command to format the output:
console jq -S . output.log > output_formatted.log
So after using the json
filter, all events will be parsed and each event should look like this:
output_formatted.log {
" @timestamp " : " 2023-11-21T04:16:10.943722335Z " ,
" @version " : " 1 " ,
" brand " : " Apple " ,
" category " : " smartphones " ,
" description " : " An apple mobile which is nothing like apple " ,
" discountPercentage " : 12.96 ,
" event " : {
" original " : " { \\" id\\ " :1, \\" title\\ " : \\" iPhone 9 \\ " , \\" description\\ " : \\" An apple mobile which is nothing like apple\\ " , \\" price\\ " :549, \\" discountPercentage\\ " :12.96, \\" rating\\ " :4.69, \\" stock\\ " :94, \\" brand\\ " : \\" Apple\\ " , \\" category\\ " : \\" smartphones\\ " , \\" thumbnail\\ " : \\" https:\/\/dminhvu.com/data/ 1 /thumbnail.jpg\\ " , \\" images\\ " :[ \\" https:\/\/dminhvu.com/data/ 1 / 1 .jpg\\ " , \\" https:\/\/dminhvu.com/data/ 1 / 2 .jpg\\ " , \\" https:\/\/dminhvu.com/data/ 1 / 3 .jpg\\ " , \\" https:\/\/dminhvu.com/data/ 1 / 4 .jpg\\ " , \\" https:\/\/dminhvu.com/data/ 1 /thumbnail.jpg\\ " ]} "
},
" host " : {
" name " : " dminhvu "
},
" id " : 1 ,
" images " : [
" /data/1/1.jpg " ,
" /data/1/2.jpg " ,
" /data/1/3.jpg " ,
" /data/1/4.jpg " ,
" /data/1/thumbnail.jpg "
],
" log " : {
" file " : {
" path " : " /home/dminhvu/elastic/test.log "
}
},
" message " : " { \\" id\\ " :1, \\" title\\ " : \\" iPhone 9 \\ " , \\" description\\ " : \\" An apple mobile which is nothing like apple\\ " , \\" price\\ " :549, \\" discountPercentage\\ " :12.96, \\" rating\\ " :4.69, \\" stock\\ " :94, \\" brand\\ " : \\" Apple\\ " , \\" category\\ " : \\" smartphones\\ " , \\" thumbnail\\ " : \\" https:\/\/dminhvu.com/data/ 1 /thumbnail.jpg\\ " , \\" images\\ " :[ \\" https:\/\/dminhvu.com/data/ 1 / 1 .jpg\\ " , \\" https:\/\/dminhvu.com/data/ 1 / 2 .jpg\\ " , \\" https:\/\/dminhvu.com/data/ 1 / 3 .jpg\\ " , \\" https:\/\/dminhvu.com/data/ 1 / 4 .jpg\\ " , \\" https:\/\/dminhvu.com/data/ 1 /thumbnail.jpg\\ " ]} " ,
" price " : 549 ,
" rating " : 4.69 ,
" stock " : 94 ,
" thumbnail " : " /data/1/thumbnail.jpg " ,
" title " : " iPhone 9 "
}
Now you can access the fields in the JSON object as usual with the syntax: %{field_name}
, or %{[field_name][sub_field_name]}
. To access array fields, you can use %{[field_name][index]}
.
For example, I will add a new field using the mutate filter based on some conditonal statements :
logstash.conf # ...
filter {
json {
source => " message "
}
mutate {
add_field => {
" dummy_description " => " %{[title]} is of brand %{[brand]} "
" first_image " => " %{[images][0]} "
}
}
if [ discountPercentage ] > 0 {
mutate {
add_field => {
" dummy_discount " => " This product has %{[discountPercentage]}% discount "
}
}
}
}
# ...
Suppose you have another log file not fully in JSON format but only a part of it is JSON, like this one:
test-2.log POST /some/endpoint body={ " id " :1, " name " : " Minh Vu " , " age " :21} headers={ " Content-Type " : " application/json " }
POST /some/endpoint body={ " id " :2, " name " : " WiseCode " , " age " :22} headers={ " Content-Type " : " application/json " }
POST /some/endpoint body={ " id " :3, " name " : " Nhi Pham " , " age " :23} headers={ " Content-Type " : " application/json " }
Now you want to parse the body
and headers
field as JSON object. You can keep the same input
and output
filter like above, but you need to change the filter
section:
Use the grok filter to get necessary fields.
Then use the json
filter with specified source
and target
(can be the same) to convert the string to JSON object.
logstash.conf input {
file {
path => " /home/dminhvu/elastic/test-2.log "
start_position => " beginning "
sincedb_path => " /dev/null "
codec => multiline {
pattern => "\n"
what => " next "
}
}
}
filter {
grok { # parse the message field with grok
match => {
" message " => " POST %{URIPATHPARAM:request} body=%{GREEDYDATA:body} headers=%{GREEDYDATA:headers} "
}
}
json { # convert body to JSON object
source => " body "
target => " body "
}
json { # convert headers to JSON object
source => " headers "
target => " headers "
}
}
output {
file {
path => " /home/dminhvu/elastic/output-2.log "
codec => " json_lines "
}
}
After we get the output-2.log
, you can use the jq
command to format the output:
console jq -S . output-2.log > output-2_formatted.log
One event of the output should look like this:
output-2_formatted.log {
" @timestamp " : " 2023-11-21T05:16:12.021382978Z " ,
" @version " : " 1 " ,
" body " : {
" age " : 21 ,
" id " : 1 ,
" name " : " Minh Vu "
},
" event " : {
" original " : " POST /some/endpoint body={ \\" id\\ " :1, \" name \\" :\ " Minh Vu \" , \" age \\" : 21 } headers= { \\ " Content-Type \\" : \ " application/json \" } "
},
" headers " : {
" Content-Type " : " application/json "
},
" host " : {
" name " : " dminhvu "
},
" log " : {
" file " : {
" path " : " /home/dminhvu/elastic/test-2.log "
}
},
" message " : " POST /some/endpoint body={ \\" id\\ " :1, \" name \\" :\ " Minh Vu \" , \" age \\" : 21 } headers= { \\ " Content-Type \\" : \ " application/json \" } " ,
" request " : " /some/endpoint "
}
As you can see, the body
and headers
fields are now JSON objects.
In this tutorial, I showd you how to parse a string to JSON object in Logstash in 2 common cases:
Inputting multiline JSON data: use the multiline
codec followed by the json
filter.
Converting JSON string in a field to JSON object: use the json
filter with the source
and target
option.
If you have any questions, please leave a comment below.
Comments
Be the first to comment!