Building a JSON Parser in Rust Pt. 1

September 7th, 2024

Introduction

Welcome to this four part tutorial on building a JSON parser in Rust! Each step will build on the last, so make sure you go through these posts in order. I will be building along side all of you readers, and am writing these posts as I build!

We will build a rust tool that given some JSON, will determine whether it is valid or invalid JSON.

For context, JSON (JavaScript Object Notation) is a data-interchange format that is widely used across the software industry. You can read more about it here. On this website, you can also find the official grammar for JSON. Our parser won't support the entire grammar, but it will support the more common features.

This parser will support basic objects, arrays, strings, numbers, booleans, and null. This parser will not support exponents, escaping characters, hexadecimals, or whitespace characters. These can all be added (somewhat) easily as an extension to this parser!

Each step will explain smaller, more ingestible code samples and present the resulting code at the end!

Step 0. Setup

Before we begin coding, you must have the Rust compiler installed in your machine. You can follow the instructions on the main Rust documentation website to do so.

Then, in your terminal, navigate to the directory that you want this project to live in and type cargo new rust-json-parser. This will create a new directory with a rust project within it. The entrypoint of this project is src/main.rs. Navigate to that file, and open it in your editor of choice!

Step 1. Parsing an empty object

In the first step, we will parse the most basic JSON object of all - the empty object {}. We will build the program so that it accepts a file as input, but accepting input from standard input should be a very simple extension.

Before we start, lets create some tests. Create a directory titled samples, and inside it create a file, call it whatever you want. Populate the file with {}. Create another file and populate it with literally anything else.

In our main function, we want to do the following:

  1. Get the command line arguments.
  2. Get the JSON file and validate its existence
  3. Parse the JSON, return 0 for success and 1 for failure.

Now, open src/main.rs and empty out the main function. To get command line arguments in rust, we can use the std::env module.

let args = env::args().collect::<Vec<String>>();

This will get the command line arguments and populate them into a vector of strings.

At this point, we should do some input sanitization and ensure that a filename was passed in as an argument. If it is not passed in, we can use the exit function from the std::process::exit module.

if args.len() < 2 {
    println!("Usage: {} <filename>", args[0]);
    exit(1);
}

Now, we can get the filename from the arguments, and validate the files existence. To do so, we can use the std::fs module.

let filename = &args[1];

let contents = match fs::read_to_string(filename) {
    Ok(val) => val,
    Err(e) => {
        println!("Could not open file. Error: {}", e);
        exit(1);
    }
}

In this code block, we are settings contents to be the contents of the file if the file exists, and printing and error and exiting if the file does not exist.

Now, we have accomplished part 1 and part 2 of the three parts.

For part 3 (and all following parts building the parser), we will be using a Rust data structure called Peekable, which is essentially an iterator that allows you to reference (or peek at) the next element in the list. All of our parsing functions will accept a mutable Peekable data structure as input, which we can modify and iterate through as we validate the JSON.

To parse an empty object, first we need to peek at the first character and check that it is a {, and then we need to peek at the second character and check it is a }.

fn parse_object(chars: &mut Peekable<Chars>) -> Result<bool, String> {
    // Check the first character
    if let Some('{') = chars.peek() {
        chars.next();
    } else {
        return Err("Invalid JSON: expected {");
    }

    // Check the second character
    if let Some('}') = chars.peek() {
        chars.next();
    } else {
        return Err("Invalid JSON: expected }");
    }

    return Ok(true);
}

The return type of this function is a Result, which can denote a successful result or an error. Returning Ok(success_value) denotes that the function did not run into any errors, whereas returning Err(fail_value) denotes that the function ran into some kind of error while running. We will use this structure in many different functions to help us propogate errors in our parser.

Now that we've written this function, we can call it inside the main function.

let parse_result = parse_object(&mut contents.chars.peekable());

We should exit the program based on the success of this function.


fn main() {
    ...

    let parse_result = parse_object(&mut contents.chars.peekable());

    match parse_result {
        Ok(_) => exit(0),
        Err(val) => {
            println!("{}", err);
            exit(1);
        }
    }
}

Now, we can run the program as such: cargo run -- {path/to/test_file}. After running with the valid test case, running echo $? should produce 0 and it should produce 1 after running the program with the invalid test case.

Here is the entire code sample for this step:

use std::{env,fs,process::exit};

fn main() {
    let args = env::args().collect::<Vec<String>>();

    // Ensure we pass a filename
    if args.len() < 2 {
        println!("Usage: {} <name>", args[0]);
        exit(1);
    }

    // Get the filename
    let filename = &args[1];

    // Get the filecontents, or exit
    let contents = match fs::read_to_string(filename) {
        Ok(val) => val,
        Err(e) => {
            println!("Could not open file. Error: {}", e);
            exit(1);
        }
    };

    let parse_result = parse_object(&mut contents.chars.peekable());

    match parse_result {
        Ok(_) => exit(0),
        Err(e) => {
            println!("{}", e);
            exit(1);
        }
        
    }
}

fn parse_object(chars: &mut Peekable<Chars>) -> Result<bool, String> {
    // Check the first character
    if let Some('{') = chars.peek() {
        chars.next();
    } else {
        return Err("Invalid JSON: expected {");
    }

    // Check the second character
    if let Some('}') = chars.peek() {
        chars.next();
    } else {
        return Err("Invalid JSON: expected }");
    }

    return Ok(true);
}

And thats it for step 1! We now have a program that can successfully identify an empty object. Simple enough! Checkout out step 2 here.