As a software engineer, I always strive to handle data as efficiently as possible. JSON has been my go-to format for storing and transmitting data in a clear and structured way. But recently, when working on a new feature, I decided to switch to the JSON Lines (JSONL) format due to its streaming benefits. In this blog post, I will describe what JSONL is, explain its advantages over JSON and provide a simple example to get you started.
Table Of Contents
What Is JSONL?
JSONL is a format for storing and transmitting structured data that consists of separate JSON objects with each object represented as a single line of text. In contrast to traditional JSON format, JSONL files do not have a comma-separated list of objects wrapped in a single set of square brackets. Instead, each line of a JSONL file is a self-contained JSON object.
JSONL can be used in a variety of applications, such as log files, data pipelines, message queues or anywhere it is necessary to process data in a streaming fashion. You can see what JSONL looks like in the example section of this blog post.
Benefits Of JSONL Over JSON
Now we know what JSONL is, lets take a look at some benefits of using JSONL over traditional JSON:
1. Stream processing: JSONL can be read and written incrementally, without needing to load the entire file into memory at once. This makes it more flexible for large datasets or applications that need to process data on the fly.
2. Error handling: In traditional JSON, a syntax error in an object will make the entire file unreadable. In contrast, JSONL files can still be processed even if individual objects contain errors. This would allow you to feedback to the user that for example object 16 is invalid, making it easier to handle errors in large datasets.
3. Space Efficiency: Although negligible, JSONL can be more space-efficient than traditional JSON because each JSON object is represented as a separate line, and there is no need to use commas or other separators between objects, which can reduce the overall file size.
JSONL In Go Example
Let’s take a look at an example of reading and unmarshalling a JSONL file in Go. Lets begin by creating a people.jsonl
file which contains a number of individual JSON Objects:
{
"name": "John Smith",
"age": 30,
"address": "London, England"
}
{
"name": "Michael Scott",
"age": 40,
"address": "New York, USA"
}
{
"name": "Brian Murphy",
"age": 50,
"address": "Dublin, Ireland"
}
JSONLNext we’ll create a main.go
file and define a struct to represent our JSON objects:
package main
type Person struct {
Name string `json:"name"`
Age int `json:"age"`
Address string `json:"address"`
}
GoNext, we create a function which can read a JSONL file and unmarshal the JSON objects into a Person struct one at a time. Each iteration we will print the details of each person, including their name
, age
, and address
.
func peopleFromFile(file io.Reader) error {
decoder := json.NewDecoder(file)
// Each iteration will be the next JSON Object
for decoder.More() {
var person Person
if err := decoder.Decode(&person); err != nil {
return err
}
// We can now process each json object individually without having to load the entire file at once
fmt.Println(person.Name, person.Age, person.Address)
// Iteration 1 - John Smith 30 London, England
// Iteration 2 - Michael Scott 40 New York, USA
// Iteration 3 - Brian Murphy 50 Dublin, Ireland
}
return nil
}
GoSince each line in a JSONL file is a valid JSON object, the Decode()
method can handle each line as a separate JSON object and decode it into the target Go data structure.
By using the json.Decoder
to handle JSONL format, the peopleFromFile
function can efficiently read and decode large JSONL files or streams without having to load the entire file into memory at once.
By representing a file as an io.Reader
, we can use the same code to read data from files, network connections, and other sources that implement the io.Reader
interface.
Finally, we sew it all together by creating a main executable function which reads the file and passes the file as a parameter into peopleFromFile
.
func main() {
file, err := os.Open("people.jsonl")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// We can pass in any file which implements io.Reader e.g. multipart.File or os.File
if err = peopleFromFile(file); err != nil {
log.Fatal(err)
}
}
GoIn this example, we read a file called people.jsonl
and print the name
, age
, and address
for each person in the file on the fly. If you would like to download the code example above, you can find it on our GitHub.
Conclusion
In conclusion, JSONL is a powerful format that offers several advantages over traditional JSON. It is more memory efficient, more flexible, and more human-readable than JSON, making it an excellent choice for handling large datasets in Go. I hope this article has convinced you to give JSONL a try in your next project!
If you have any questions or comments, please leave them below. Also, don’t forget to subscribe to our newsletter to receive more informative posts like this directly in your inbox.
If you found this article helpful, please share it with your friends and colleagues.
Thank you for reading!
Bonus Section
To process gzip JSONL files, you only need to make a minor modification to the io.Reader parameter that is being passed in to the peopleFromFile
function, see below:
if fileType == model.FileTypeGzip {
gzipReader, err := gzip.NewReader(file)
if err != nil {
return err
}
file = gzipReader
}
Go
Leave a Reply