Building a JSON Parser in Rust Pt. 3

September 7th, 2024

Welcome back to this four part tutorial on building a JSON parser in Rust! If you haven't read part 1 and 2, please do so first. For a quick refesher, in part 1 we built a parser that recognized empty JSON objects such as {}. In part 2, we built extended this parser to support key-value pairs where all values are strings.

Step 3. Parsing booleans, numbers, null

This step will involve parsing booleans, numbers, and null values. A sample JSON file is given below:

{
    "key1": true,
    "key2": 12,
    "key3": null
}

Before we start, lets create a valid test case and an invalid test case. In the samples directory, create another file valid.json and populate it with the example from above. Then, create another file invalid.json and populate it with:

{
    "key1": []
}

The above JSON is "invalid", as we will only support strings, booleans, numbers, and null for values, not arrays.

To complete this step, we will write functions to parse booleans, numbers, and null values. We will create a new function called parse_value that will call our newly written functions based on the Peekable data structure that we pass to it.

Parsing values

Now that we have created a subset of values that we want to parse, we can write a function that handles all of these cases. This function will call our helper functions (written later) based on the first character of the value. If the first character is a double quote, we will try parse a string. If it is a digit, we will try parse a number, and so on.

fn parse_value(chars: &mut Peekable<Chars>) -> Result<bool, String> {
    match chars.peek() {
        Some('"') => parse_string(chars),
        Some('-') | Some('0'..='9') => parse_number(chars),
        Some('t') => parse_literal(chars, "true"),
        Some('f') => parse_literal(chars, "false"),
        Some('n') => parse_literal(chars, "null"),
        _ => Err("Invalid JSON: failed to parse value".to_string())
    }
}

Parsing Booleans and null

For the purpose of this parser, we can view booleans and nulls as the same primitive type (a literal). This is a collection of characters that is not wrapped in double quotes. Let's write a function to parse a literal based on a string input we give it.

for ch in literal.chars() {
    if let Some(c) = chars.peek() {
        if c.eq(&ch) {
            chars.next();
        } else {
            return Err("Invalid JSON: failed to parse literal literal".to_string());
        }
    }
}

This piece of code goes through each character of the literal, and matches it with the next character in the Peekable data structure. If they match, we consume the character and move on. If they don't match, we return an error. We must also ensure that we consume the entire literal, and not just a part of it. We can add a counter to ensure that we consume the entire literal.

fn parse_literal(chars: &mut Peekable<Chars>, literal: &str) -> Result<bool, String> { 
    let mut count = 0;
    for char in literal.chars() {
        if let Some(c) = chars.peek() {
            if c.eq(&char) {
                chars.next();
                count += 1;
            } else {
                return Err("Invalid JSON: failed to parse literal".to_string());
            }
        } 
    }

    // error if the entire literal is not consumed
    if count != literal.len() {
        return Err("Invalid JSON: failed to parse literal".to_string());
    }

    return Ok(true);
}

This check will ensure that we don't accept a value such a tru when we are expecting true.

Parsing numbers

To parse numbers, we will first check if the first character is a -. If so, we consume it.

if let Some('-') = chars.peek() {
    chars.next();
}

Then, we consume as many digits as we can. We will also consume a decimal point if we see one.

let mut is_decimal = false;

while let Some(c) = chars.peek() {
    if c.is_ascii_digit() {
        chars.next();
    } else if c == '.' {
        if is_decimal {
            return Err("Invalid JSON: failed to parse number".to_string());
        }
        is_decimal = true;
        chars.next();
    } else {
        break;
    }
}

The entire function looks like this:

fn parse_number(chars: &mut Peekable<Chars>) -> Result<bool, String> {
    // This should able to parse decimal
    // Consume leading minus sign if present
    let mut is_decimal = false;

    if let Some('-') = chars.peek() {
        chars.next();
    }

    // Consume digits
    while let Some(c) = chars.peek() {
        if c.is_ascii_digit() {
            chars.next();
        } else if c == &'.' {
            if is_decimal {
                return Err("Invalid JSON: failed to parse number".to_string());
            } else {
                is_decimal = true;
                chars.next();
            }
        } else {
            break;
       }
    }

    return Ok(true);
}

Changes to parse_object

In Part 2, our parse_object function tried to parse a string after seeing a semi-colon. We can now change this to call parse_value instead. This will ensure that our parser accepts all supported value types as values, instead of just strings.

fn parse_object(chars: &mut Peekable<Chars>) -> Result<bool, String> {
    ...

    while let Some(_) = chars.peek() {
        ...

        if let Some(':') = chars.next() {
            if parse_value(chars).is_err() {  // changed from parse_string to parse_value
                return Err("Invalid JSON: failed to parse value".to_string());
            }
        } 
        
        ...
    }

    return Err("Invalid JSON".to_string());
}

Testing our changes

We can verify that the valid test passes by running cargo run -- samples/valid.json. Checking the exit code with echo $? should produce 0. Running the invalid test case should produce 1. We can verify that the invalid test case fails by running cargo run -- samples/invalid.json. Checking the exit code with echo $? should produce 1, and it should say 'Invalid JSON' in stdout.