You Need To Start Using JSONL In Your Golang Projects!

As a software engineer, I always strive to handle data as efficiently as possible. JSON has been my go-to format for storing and transmitting data in a clear and structured way. But recently, when working on a new feature, I decided to switch to the JSON Lines (JSONL) format due to its streaming benefits. In this blog post, I will describe what JSONL is, explain its advantages over JSON and provide a simple example to get you started.

Table Of Contents

What Is JSONL?

JSONL is a format for storing and transmitting structured data that consists of separate JSON objects with each object represented as a single line of text. In contrast to traditional JSON format, JSONL files do not have a comma-separated list of objects wrapped in a single set of square brackets. Instead, each line of a JSONL file is a self-contained JSON object.

JSONL can be used in a variety of applications, such as log files, data pipelines, message queues or anywhere it is necessary to process data in a streaming fashion. You can see what JSONL looks like in the example section of this blog post.

Benefits Of JSONL Over JSON

Now we know what JSONL is, lets take a look at some benefits of using JSONL over traditional JSON:

1. Stream processing: JSONL can be read and written incrementally, without needing to load the entire file into memory at once. This makes it more flexible for large datasets or applications that need to process data on the fly.

2. Error handling: In traditional JSON, a syntax error in an object will make the entire file unreadable. In contrast, JSONL files can still be processed even if individual objects contain errors. This would allow you to feedback to the user that for example object 16 is invalid, making it easier to handle errors in large datasets.

3. Space Efficiency: Although negligible, JSONL can be more space-efficient than traditional JSON because each JSON object is represented as a separate line, and there is no need to use commas or other separators between objects, which can reduce the overall file size.

JSONL In Go Example

Let’s take a look at an example of reading and unmarshalling a JSONL file in Go. Lets begin by creating a people.jsonl file which contains a number of individual JSON Objects:

{
  "name": "John Smith",
  "age": 30,
  "address": "London, England"
}
{
  "name": "Michael Scott",
  "age": 40,
  "address": "New York, USA"
}
{
  "name": "Brian Murphy",
  "age": 50,
  "address": "Dublin, Ireland"
}
JSONL

Next we’ll create a main.go file and define a struct to represent our JSON objects:

package main

type Person struct {
    Name    string `json:"name"`
    Age     int    `json:"age"`
    Address string `json:"address"`
}
Go

Next, we create a function which can read a JSONL file and unmarshal the JSON objects into a Person struct one at a time. Each iteration we will print the details of each person, including their name, age, and address.

func peopleFromFile(file io.Reader) error {
    decoder := json.NewDecoder(file)
    // Each iteration will be the next JSON Object
    for decoder.More() {
        var person Person
        if err := decoder.Decode(&person); err != nil {
            return err
        }
        
        // We can now process each json object individually without having to load the entire file at once
        fmt.Println(person.Name, person.Age, person.Address)
        // Iteration 1 - John Smith 30 London, England
        // Iteration 2 - Michael Scott 40 New York, USA
        // Iteration 3 - Brian Murphy 50 Dublin, Ireland
    }
    
    return nil
}
Go

Since each line in a JSONL file is a valid JSON object, the Decode() method can handle each line as a separate JSON object and decode it into the target Go data structure.

By using the json.Decoder to handle JSONL format, the peopleFromFile function can efficiently read and decode large JSONL files or streams without having to load the entire file into memory at once.

By representing a file as an io.Reader, we can use the same code to read data from files, network connections, and other sources that implement the io.Reader interface.

Finally, we sew it all together by creating a main executable function which reads the file and passes the file as a parameter into peopleFromFile.

func main() {
    file, err := os.Open("people.jsonl")
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()

    // We can pass in any file which implements io.Reader e.g. multipart.File or os.File
    if err = peopleFromFile(file); err != nil {
        log.Fatal(err)
    }
}
Go

In this example, we read a file called people.jsonl and print the name, age, and address for each person in the file on the fly. If you would like to download the code example above, you can find it on our GitHub.

Conclusion

In conclusion, JSONL is a powerful format that offers several advantages over traditional JSON. It is more memory efficient, more flexible, and more human-readable than JSON, making it an excellent choice for handling large datasets in Go. I hope this article has convinced you to give JSONL a try in your next project!

If you have any questions or comments, please leave them below. Also, don’t forget to subscribe to our newsletter to receive more informative posts like this directly in your inbox.

If you found this article helpful, please share it with your friends and colleagues.

Thank you for reading!

Bonus Section

To process gzip JSONL files, you only need to make a minor modification to the io.Reader parameter that is being passed in to the peopleFromFile function, see below:

if fileType == model.FileTypeGzip {
    gzipReader, err := gzip.NewReader(file)
    if err != nil {
        return err
    }
    file = gzipReader
}
Go

2 responses

  1. These JSON Encoder and Decoder types can be used in a broad range of scenarios, such as reading and writing to HTTP connections, WebSockets, or files. Could you please add more details about how you are using this Encoder/Decoder for HTTP connections ?? will this be fast and memory efficient for HTTP connections too??

    1. Hi Rohan, thanks for your comment.

      If you need to import a large number of entities at once, using JSONL would be a great option. For instance, if you have an API that can create a person with basic details like their name and age and you want to import 50,000 people at one time, it could be problematic to do so with JSON. Loading the entire structure at once might cause you to run out of RAM.

      To avoid this issue, users can upload a JSONL file in the HTTP request as a Multipart File. The API can then read each JSON object one at a time and save them to the database in batches of 1000. While this operation may not return fast enough during an HTTP request, it can be done in another thread or Kubernetes pod in the background. To keep the user informed, you could return a response to inform them that their import is in process. This is not ideal, but importing 50,000 entities is always going to take time.

      This blog post is primarily meant to showcase what can be achieved with JSONL. I didn’t delve into specific use cases as JSONL can be used in various scenarios. I hope this answers your question.

Leave a Reply

Your email address will not be published. Required fields are marked *