In this tutorial, I will show you how to parse data from a CSV file in Logstash using the csv
filter plugin.
To parse data from a CSV file in Logstash, we can use the csv
filter plugin. For example:
logstash.conf input {
file {
path => " /home/dminhvu/elastic/example.csv "
start_position => " beginning "
sincedb_path => " /dev/null "
codec => multiline {
pattern => "\n"
what => " next "
}
}
}
filter {
csv {
separator => " , " # specify the separator
source => " message " # parse the message field as CSV
columns => [ " name " , " age " , " location " , " gender " ] # specify the columns
}
# do anything you want with the parsed CSV data
# ...
}
output {
file {
path => " /home/dminhvu/elastic/output.log "
codec => " json_lines " # write to JSON lines format
}
}
Suppose I have the following CSV file:
example.csv Minh Vu, 21, Vietnam, Male
Desmond, 25, United States, Male
Nhi, 30, Hong Kong, Female
Using the above config yields the following results:
output.log {
" age " : " 21 " ,
" gender " : " Male " ,
" location " : " Vietnam " ,
" message " : " Minh Vu,21,Vietnam,Male " ,
" name " : " Minh Vu "
}
{
" age " : " 25 " ,
" gender " : " Male " ,
" location " : " United States " ,
" message " : " Desmond,25,United States,Male " ,
" name " : " Desmond "
}
{
" age " : " 30 " ,
" gender " : " Female " ,
" location " : " Hong Kong " ,
" message " : " Nhi,30,Hong Kong,Female " ,
" name " : " Nhi "
}
As you can see, we parsed the CSV data into JSON format. Now, we can do anything we want with the parsed data.
Suppose your CSV file contains headers, you can use the skip_header
option and set to true
to skip the first line:
example.csv name, age, location, gender
Minh Vu, 21, Vietnam, Male
Desmond, 25, United States, Male
Nhi, 30, Hong Kong, Female
The config will look like this:
logstash.conf filter {
csv {
skip_header => " true " # skip the first line
separator => " , " # specify the separator
source => " message " # parse the message field as CSV
columns => [ " name " , " age " , " location " , " gender " ] # specify the columns
}
}
If your CSV file uses a custom separator, you can specify it using the separator
option:
example.csv Minh Vu|21|Vietnam|Male
Desmond|25|United States|Male
Nhi|30|Hong Kong|Female
The config will look like this:
logstash.conf filter {
csv {
separator => " | " # specify the separator
source => " message " # parse the message field as CSV
columns => [ " name " , " age " , " location " , " gender " ] # specify the columns
}
}
If you don't want to specify the column names, you can use the autodetect_column_names
option and set to true
:
logstash.conf filter {
csv {
autodetect_column_names => " true " # autodetect the columns
separator => " , " # specify the separator
source => " message " # parse the message field as CSV
}
}
This only works when your first row contains headers.
Comments
Be the first to comment!