Query a JSON file with Java-Large file

java write large json file
parse large json file java
large json file example
read large json file
json stream parser java
java stream read json file
json sax parser java
generic json parser java

I am trying to parse below JSON file using java. I need to be able to

  • search the file by id or name or any of the fields in the object.
  • search for empty values in the field as well.

The search should return entire object. The File will be huge and the search should still be time efficient.

[
  {
    "id": 1,
    "name": "Mark Robb",
    "last_login": "2013-01-21T05:13:41 -11:30",
    "email": "markrobb@gmail.com",
    "phone": "12345",
    "locations": [
        "Germany",
        "Austria"
    ]
},
  {
    "id": 2,
    "name": "Matt Nish",
    "last_login": "2014-02-21T07:10:41 -11:30",
    "email": "mattnish@gmail.com",
    "phone": "456123",
    "locations": [
        "France",
        "Italy"
    ]
 }
]


This is what I have tried so far using Jackson library.

public void findById(int id) {
List<Customer> customers = objectMapper.readValue(new File("src/main/resources/customers.json"), new    TypeReference<List<Customer>>(){});

            for(Customer customer: customers) {
                if(customer.getId() == id) {
                    System.out.println(customer.getName());
                }
            }
}

I just don't think this is an efficient method for a huge JSON file(About 20000 customers in a file). And there could be multiple files. Search time should not increase linearly. How can I make this time efficient? Should I use any other library?

The most efficient (both CPU and memory) way to parse is to use stream oriented parsing instead of object mapping. Usually, it takes a bit more code to be written, but also usually it is a good deal :) Both Gson and Jackson support such lightweight technique. Also, you should avoid memory allocation in the main/hot path to prevent GC pauses. To illustrate the idea I use a small GC-free library https://github.com/anatolygudkov/green-jelly:

import org.green.jelly.*;    
import java.io.CharArrayReader;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;

public class SelectById {
    public static class Customer {
        private long id;
        private String name;
        private String email;

        public void clear() {
            id = 0;
            name = null;
            email = null;
        }

        public Customer makeCopy() {
            Customer result = new Customer();
            result.id = id;
            result.name = name;
            result.email = email;
            return result;
        }

        @Override
        public String toString() {
            return "Customer{" +
                    "id=" + id +
                    ", name='" + name + '\'' +
                    ", email='" + email + '\'' +
                    '}';
        }
    }

    public static void main(String[] args) throws Exception {
        final String file = "\n" +
            "[\n" +
            "  {\n" +
            "    \"id\": 1,\n" +
            "    \"name\": \"Mark Robb\",\n" +
            "    \"last_login\": \"2013-01-21T05:13:41 -11:30\",\n" +
            "    \"email\": \"markrobb@gmail.com\",\n" +
            "    \"phone\": \"12345\",\n" +
            "    \"locations\": [\n" +
            "        \"Germany\",\n" +
            "        \"Austria\"\n" +
            "    ]\n" +
            "},\n" +
            "  {\n" +
            "    \"id\": 2,\n" +
            "    \"name\": \"Matt Nish\",\n" +
            "    \"last_login\": \"2014-02-21T07:10:41 -11:30\",\n" +
            "    \"email\": \"mattnish@gmail.com\",\n" +
            "    \"phone\": \"456123\",\n" +
            "    \"locations\": [\n" +
            "        \"France\",\n" +
            "        \"Italy\"\n" +
            "    ]\n" +
            " }\n" +
            "]\n";

        final List<Customer> selection = new ArrayList<>();

        final long selectionId = 2;

        final JsonParser parser = new JsonParser().setListener(
            new JsonParserListenerAdaptor() {
                private final Customer customer = new Customer();
                private String currentField;
                @Override
                public boolean onObjectStarted() {
                    customer.clear();
                    return true;
                }

                @Override
                public boolean onObjectMember(final CharSequence name) {
                    currentField = name.toString();
                    return true;
                }

                @Override
                public boolean onStringValue(final CharSequence data) {
                    switch (currentField) {
                        case "name":
                            customer.name = data.toString();
                            break;
                        case "email":
                            customer.email = data.toString();
                            break;
                    }
                    return true;
                }

                @Override
                public boolean onNumberValue(final JsonNumber number) {
                    if ("id".equals(currentField)) {
                        customer.id = number.mantissa();
                    }
                    return true;
                }

                @Override
                public boolean onObjectEnded() {
                    if (customer.id == selectionId) {
                        selection.add(customer.makeCopy());
                        return false; // we don't need to continue
                    }
                    return true;
                }
            }
        );

        // now let's read and parse the data with a buffer

        final CharArrayCharSequence buffer = new CharArrayCharSequence(1024);

        try (final Reader reader = new CharArrayReader(file.toCharArray())) { // replace by FileReader, for example
            int len;
            while((len = reader.read(buffer.getChars())) != -1) {
                buffer.setLength(len);
                parser.parse(buffer);
            }
        }
        parser.eoj();

        System.out.println(selection);
    }
}

It should work almost as fast as possible in Java (in case we cannot use SIMD instructions directly). To get rid of memory allocation at all (and GC pauses) in the main path, you have to replace ".toString()" (it creates new instance of String) by something reusable like StringBuilder.

The last thing which may affects overall performance is method of the file reading. And RandomAccessFile is one of the best options we have in Java. Since your encoding seems to be ASCII, just cast byte to char to pass to the JsonParser.

Java Parse Large Json File Jackson Example, In last couple of JSON tutorials for Java programmers, we have learned how How to parse large JSON File using Jackson Streaming API whole file in memory, this method can be used to read large JSON files with sizes� search the file by id or name or any of the fields in the object. search for empty values in the field as well. The search should return entire object. The File will be huge and the search should still be time efficient.

It should be possible to do this with Jackson. The trick is to use JsonParser to stream/parse the top-level array and then parse each record using ObjectMapper.readValue().

ObjectMapper objectMapper = new ObjectMapper();
File file = new File("customers.json");

try (JsonParser parser = objectMapper.getFactory().createParser(file))
{
    //Assuming top-level array
    if (parser.nextToken() != JsonToken.START_ARRAY)
        throw new RuntimeException("Expected top-level array in JSON.");

    //Now inside the array, parse each record
    while (parser.nextToken() != JsonToken.END_ARRAY)
    {
        Customer customer = objectMapper.readValue(parser, Customer.class);

        //Do something with each customer as it is parsed
        System.out.println(customer.id + ": " + customer.name);
    }
}
@JsonIgnoreProperties(ignoreUnknown = true)
public static class Customer
{
    public String id;
    public String name;
    public String email;
}

In terms of time efficiency it will need to still scan the entire file - not much you can do about that without an index or something fancier like parallel parsing. But it will be more memory efficient than reading the entire JSON into memory - this code only loads one Customer object at a time.


Also:

if(customer.getId() == id) {

Use .equals() for comparing strings, not ==:

if (customer.getId().equals(id)) {

Java Parse Large Json File GSON Example, large text file in Java. In this post we will talk about reading a large JSON file. We can read JSON in object model way. Where JSON is loaded� Another good tool for parsing large JSON files is the JSON Processing API. For an example of how to use it, see this Stack Overflow thread. To download the API itself, click here. Large JSON File Parsing for Python. One programmer friend who works in Python and handles large JSON files daily uses the Pandas Python Data Analysis Library. For

You can try the Gson library. This library implements a TypeAdapter class that converts Java objects to and from JSON by streaming serialization and deserialization.

The API is efficient and flexible especially for huge files. Here is an example:

public class GsonStream {
    public static void main(String[] args) {
        Gson gson = new Gson();

        try (Reader reader = new FileReader("src/main/resources/customers.json")) {
            Type listType = new TypeToken<List<Customer>>(){}.getType();

            // Convert JSON File to Java Object
            List<Customer> customers = gson.fromJson(reader, listType);

            List<Customer> names = customers
              .stream()
              .filter(c -> c.getId() == id)
              .map(Customer::getName)
              .collect(Collectors.toList());

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

If you want to understand how to Override the TypeAdapter abstract class here you have and example:

public class GsonTypeAdapter { 
    public static void main(String args[]) { 

        GsonBuilder builder = new GsonBuilder(); 
        builder.registerTypeAdapter(Customer.class, new customerAdapter()); 
        builder.setPrettyPrinting(); 
        Gson gson = builder.create();  

        try {
            reader = new JsonReader(new FileReader("src/main/resources/customers.json"));

            Customer customer = gson.fromJson(jsonString, Customer.class); 
            System.out.println(customer);  

            jsonString = gson.toJson(customer); 
            System.out.println(jsonString);  
        } catch (IOException e) {
            e.printStackTrace();
        }
    }      
}  

class customerAdapter extends TypeAdapter<Customer> { 
   @Override 
   public customer read(JsonReader reader) throws IOException { 
      Customer customer = new customer(); 
      reader.beginObject(); 
      String fieldName = null; 

      while (reader.hasNext()) { 
         JsonToken token = reader.peek();            

         if (token.equals(JsonToken.NAME)) {     
            //get the current token 
            fieldName = reader.nextName(); 
         } 

         if ("name".equals(fieldName)) {       
            //move to next token 
            token = reader.peek(); 
            customer.setName(reader.nextString()); 
         } 

         if("id".equals(fieldName)) { 
            //move to next token 
            token = reader.peek(); 
            customer.setRollNo(reader.nextInt()); 
         }               
      } 
      reader.endObject(); 
      return customer; 
   }  

   @Override 
   public void write(JsonWriter writer, Customer customer) throws IOException { 
      writer.beginObject(); 
      writer.name("name"); 
      writer.value(customer.getName()); 
      writer.name("id"); 
      writer.value(customer.getId()); 
      writer.endObject(); 
   } 
}  

class Customer { 
   private int id; 
   private String name;  

   public int getId() { 
      return id; 
   } 

   public void setId(int id) { 
      this.id = id; 
   }  

   public String getName() { 
      return name; 
   }  

   public void setName(String name) { 
      this.name = name; 
   }   

   public String toString() { 
      return "Customer[ name = " + name + ", id: " + id + "]"; 
   } 
}

Javarevisited: Parsing Large JSON Files using Jackson Streaming , You can read the file entirely in an in-memory data structure (a tree Still, it seemed like the sort of tool which might be easily abused: generate a large JSON file, Jackson supports mapping onto your own Java objects too. The query on line 11 can be read as "Search for JSON objects of any name having both a child node named 'title' and a child node named 'email'". This search yields two results. The query on line 14 can be read as "Seach for non-nested JSON objects named 'producer' having a child node named 'title', where the child node has a String value of 'Mr

Reading large JSON file in Java efficiently, This time let's look at a large JSON file. We will base our or if you get rid of some duplication and make the code a little bit easier to read: ? I had to figure out recently how to load multiple JSON files using Power Query. It turned out to be less easy than expected, so I figured it is worth blogging about… The scenario: I have multiple JSON files sitting in a container in Azure Blob Storage; I would like to load them all into a data model for use in Power BI. I am assuming all the files you want to load are in one container. My

Parsing a large JSON file efficiently and easily, jar file can also be downloaded from the same URL. 2. Creating a parser object to read the content of the JSON document. To create a parser for� You just need to open any text editor paste below code and save that file with .json extension that’s it, your JSON file is ready. You can save this file like sample-json.json.

Streaming large JSON file with Jackson, The standard way of reading the lines of the file is in memory – both Guava and Apache Commons IO provide a quick way to do just that: Files. Working with JSON files in Spark. Spark SQL provides spark.read.json("path") to read a single line and multiline (multiple lines) JSON file into Spark DataFrame and dataframe.write.json("path") to save or write to JSON file, In this tutorial, you will learn how to read a single file, multiple files, all files from a directory into DataFrame and writing DataFrame back to JSON file using Scala

Comments
  • It's not clear what you're asking right now because you seem to have missed editing the code blocks.
  • Yes, you need to edit this post and correct the problems with the missing code blocks.
  • I am trying edit it. New user here. Please give some time
  • How "huge" are you saying? GB? TB? You need to inspect the whole file anyway, which must be a sequential process... If you really have that much data, and you must search it, a database makes more sense than a plain file... Also, src/main/resources doesn't exist when your code is actually compiled.
  • If you want to make it more efficient, don't use readValue, use readTree to get a JsonArray or JsonNode object. Then you skip deserializing the whole file into your own Java objects
  • Is it faster to read one object at a time rather than the entire file ?
  • I suppose it depends on just how big the input file is. But if the file is big enough then it might not fit in memory or consume enough memory to make the garbage collector do a lot of work.
  • If want to search multiple times on the same file, wouldn't it be expensive to keep reading the file every time ?
  • Yes, if you need to search for records over and over you might want to look at indexing / using a database. You could still use this code to read the large files once and then add each record to the database I suppose.
  • You haven't used TypeAdapter in this sample code. Is that not required ?
  • Well I added an example of how to override that abstract class.
  • Is TypeAdapter used to customise the way a file is read ? I am able to retrieve what I need using the first example you gave. Is there any reason I should still implement TypeAdapter ?
  • Jackson has been benchmarked as faster than Gson, last I checked
  • I was just reading about Jackson being faster with large files. I will probably implement that