Building a JSON Parser in Rust Pt. 2

September 7th, 2024

Welcome back to this four part tutorial on building a JSON parser in Rust! If you haven't read part 1, please do so first. For a quick refesher, in part 1 we built a parser that recognized empty JSON objects such as {}.

Step 2. Parsing basic objects

In this step, we will parse basic objects that have key-value pairs, where all values are strings! A sample object is given below:

{
    "key1": "value1",
    "key2": "value2"
}

Before we start, lets create a valid test case and an invalid test case. In the samples directory, create another file (call it whatever) and populate it with the example from above. Then, create another file and populate it with:

{
    "key2": 12,
    "key3": null
}

The above JSON is currently "invalid", as we only support strings for values, not other JSON primitives.

To complete this step, we will modify the parse_object function that we wrote in part 1. Right now, the function only supports parsing an empty object. We will modify it to support parsing objects with key-value pairs.

When parsing an JSON object, the first thing we expect to see is a key. After the key comes a colon, and after the colon comes a value. After this, if we see a comma, we expect to see another key. If we see a closing bracket, we are done parsing the object.

Hence, our logic for the parse_object function should be as follows:

  1. consume the opening brace
  2. if next char is a closing brace, end here
  3. consume a key
  4. consume the colon
  5. consume the value
  6. consume the comma and go back to step 3, or
  7. consume the closing brace

Let's write a block of code for each of these steps:

  1. Consume the opening brace
if let Some('{') = chars.peek() {
    chars.next(); // Consume the opening brace
} else {
    return Err("Invalid JSON: expected objected to start with '{'".to_string());
}

  1. If next char is a closing brace, end here
// Consume the closing brace
if let Some('}') = chars.peek() {
    chars.next(); // Consume the closing brace
    return Ok(true); // Empty object
} else {
    // Here we will do steps 3-7 in a loop
}

Since we can have an arbitrary number of kv pairs, steps 3 to 7 are done in a loop, and they will explained in the context of that loop as it doesn't make sense to do otherwise.

3-6. Consume key-value pairs

// While we still have characters to consume
while let Some(_) = chars.peek() {
    // 3. consume a key
    if parse_string(chars).is_err() {
        return Err("Invalid JSON: failed to parse key".to_string());
    }

    // 4. consume a colon
    if let Some(':') = chars.next() {
        // 5. consume a value
        if parse_string(chars).is_err() { 
            return Err("Invalid JSON: failed to parse value".to_string());
        }
    } else {
        return Err("Invalid JSON: no colon after key".to_string());
    }

     match chars.next() {
         // 6. consume a comma and go back to step 3, or
        Some(',') => skip_whitespace(chars),
        // 7. consume the closing brace
        Some('}') => return Ok(true),
        // If we see anything else, it's invalid JSON
        _ => {
            return Err("Invalid JSON: expected comma or closing brace".to_string());
        }
    }
}

This lays out the main logic of our parse_object function. Here is the full implementation of this function:

fn parse_object(chars: &mut Peekable<Chars>) -> Result<bool, String> {
    if let Some('{') = chars.peek() {
        chars.next(); 
    } else {
        return Err("Invalid JSON: expected objected to start with '{'".to_string());
    }

    if let Some('}') = chars.peek() {
        chars.next(); 
        return Ok(true); 
    }

     while let Some(_) = chars.peek() {
        if parse_string(chars).is_err() {
            return Err("Invalid JSON: failed to parse key".to_string());
        }

        if let Some(':') = chars.next() {
            if parse_string(chars).is_err() { 
                return Err("Invalid JSON: failed to parse value".to_string());
            }
        } else {
            // Didn't see a colon, invalid JSON
            return Err("Invalid JSON: no colon after key".to_string());
        }

        match chars.next() {
            Some(',') => skip_whitespace(chars),
            Some('}') => return Ok(true),
            _ => {
                return Err("Invalid JSON: expected comma or closing brace".to_string());
            }
        }
    }

    return Err("Invalid JSON".to_string());
}

We still need to implement the parse_string function, which will take the Peekable data stucture as an argument and try to consume a JSON string from the current state of the structure.

We can implement the function in the following steps:

  1. consume the opening quote
if let Some('"') = chars.peek() {
    chars.next();
} else {
    return false;
}
  1. Consume characters until finished OR until seeing the closing quote
 while let Some(c) = chars.peek() {
        match c {
            '"' => {
                chars.next();
                // We have seen the closing '"', we can return true!
                return true;
            },
            // Consume character
            _ => chars.next();
        }
    }

Here is the full implementation of the parse_string function:

fn parse_string(chars: &mut Peekable<Chars>) -> Result<bool, String> {
    // Check that string starts with a '"'
    if let Some('"') = chars.peek() {
        chars.next();
    } else {
        return Err("Invalid JSON: Expected opening quote".to_string());
    }

    // Consume characters until finished OR until seeing the closing '"'
    while let Some(c) = chars.peek() {
        match c {
            '"' => {
                chars.next();
                // We have seen the closing '"', we can return true!
                return Ok(true);
            },
            // Consume character
            _ => chars.next();
        }
    }

    // If we reach here, that means a opening (") was never closed
    return Err("Invalid JSON: expected string to close".to_string());
}

Note: this function assumes that your strings don't escape any characters. Implementing support for escaping characters is left is a challenge for the reader.

At this point we have implemented the parse_object and parse_string functions. We can now test the program with the valid and invalid test cases we created earlier. Now, we can run the program as such: cargo run -- {path/to/test_file}. After running with the valid test case, running echo $? should produce 0 and it should produce 1 after running the program with the invalid test case. For the invalid test case, you should see Invalid JSON: failed to parse value in stdout.

Here is the entire soure code for this step:

fn main() {
    ... // Same as part 1
}

fn parse_string(chars: &mut Peekable<Chars>) -> Result<bool, String> {
    // Check that string starts with a '"'
    if let Some('"') = chars.peek() {
        chars.next();
    } else {
        return Err("Invalid JSON: Expected opening quote".to_string());
    }

    // Consume characters until finished OR until seeing the closing '"'
    while let Some(c) = chars.peek() {
        match c {
            '"' => {
                chars.next();
                // We have seen the closing '"', we can return true!
                return Ok(true);
            },
            // Consume character
            _ => chars.next();
        }
    }

    // If we reach here, that means a opening (") was never closed
    return Err("Invalid JSON: expected string to close".to_string());
}
fn parse_object(chars: &mut Peekable<Chars>) -> Result<bool, String> {
    if let Some('{') = chars.peek() {
        chars.next(); 
    } else {
        return Err("Invalid JSON: expected objected to start with '{'".to_string());
    }

    if let Some('}') = chars.peek() {
        chars.next(); 
        return Ok(true); 
    }

     while let Some(_) = chars.peek() {
        if parse_string(chars).is_err() {
            return Err("Invalid JSON: failed to parse key".to_string());
        }

        if let Some(':') = chars.next() {
            if parse_string(chars).is_err() { 
                return Err("Invalid JSON: failed to parse value".to_string());
            }
        } else {
            // Didn't see a colon, invalid JSON
            return Err("Invalid JSON: no colon after key".to_string());
        }

        match chars.next() {
            Some(',') => ()
            Some('}') => return Ok(true),
            _ => {
                return Err("Invalid JSON: expected comma or closing brace".to_string());
            }
        }
    }

    return Err("Invalid JSON".to_string());
}

And thats it for step 2! We now have a program that can successfully identify an empty object. Simple enough! Checkout out step 3.