Welcome back to this four part tutorial on building a JSON parser in Rust! If you haven't read part 1, please do so first.
For a quick refesher, in part 1 we built a parser that recognized empty JSON objects such as {}
.
Step 2. Parsing basic objects
In this step, we will parse basic objects that have key-value pairs, where all values are strings! A sample object is given below:
{
"key1": "value1",
"key2": "value2"
}
Before we start, lets create a valid test case and an invalid test case.
In the samples
directory, create another file (call it whatever) and populate it with the example from above. Then, create another file and populate it with:
{
"key2": 12,
"key3": null
}
The above JSON is currently "invalid", as we only support strings for values, not other JSON primitives.
To complete this step, we will modify the parse_object
function that we wrote in part 1.
Right now, the function only supports parsing an empty object. We will modify it to support parsing objects with key-value pairs.
When parsing an JSON object, the first thing we expect to see is a key. After the key comes a colon, and after the colon comes a value. After this, if we see a comma, we expect to see another key. If we see a closing bracket, we are done parsing the object.
Hence, our logic for the parse_object function should be as follows:
- consume the opening brace
- if next char is a closing brace, end here
- consume a key
- consume the colon
- consume the value
- consume the comma and go back to step 3, or
- consume the closing brace
Let's write a block of code for each of these steps:
- Consume the opening brace
if let Some('{') = chars.peek() {
chars.next(); // Consume the opening brace
} else {
return Err("Invalid JSON: expected objected to start with '{'".to_string());
}
- If next char is a closing brace, end here
// Consume the closing brace
if let Some('}') = chars.peek() {
chars.next(); // Consume the closing brace
return Ok(true); // Empty object
} else {
// Here we will do steps 3-7 in a loop
}
Since we can have an arbitrary number of kv pairs, steps 3 to 7 are done in a loop, and they will explained in the context of that loop as it doesn't make sense to do otherwise.
3-6. Consume key-value pairs
// While we still have characters to consume
while let Some(_) = chars.peek() {
// 3. consume a key
if parse_string(chars).is_err() {
return Err("Invalid JSON: failed to parse key".to_string());
}
// 4. consume a colon
if let Some(':') = chars.next() {
// 5. consume a value
if parse_string(chars).is_err() {
return Err("Invalid JSON: failed to parse value".to_string());
}
} else {
return Err("Invalid JSON: no colon after key".to_string());
}
match chars.next() {
// 6. consume a comma and go back to step 3, or
Some(',') => skip_whitespace(chars),
// 7. consume the closing brace
Some('}') => return Ok(true),
// If we see anything else, it's invalid JSON
_ => {
return Err("Invalid JSON: expected comma or closing brace".to_string());
}
}
}
This lays out the main logic of our parse_object
function. Here is the full implementation of this function:
fn parse_object(chars: &mut Peekable<Chars>) -> Result<bool, String> {
if let Some('{') = chars.peek() {
chars.next();
} else {
return Err("Invalid JSON: expected objected to start with '{'".to_string());
}
if let Some('}') = chars.peek() {
chars.next();
return Ok(true);
}
while let Some(_) = chars.peek() {
if parse_string(chars).is_err() {
return Err("Invalid JSON: failed to parse key".to_string());
}
if let Some(':') = chars.next() {
if parse_string(chars).is_err() {
return Err("Invalid JSON: failed to parse value".to_string());
}
} else {
// Didn't see a colon, invalid JSON
return Err("Invalid JSON: no colon after key".to_string());
}
match chars.next() {
Some(',') => skip_whitespace(chars),
Some('}') => return Ok(true),
_ => {
return Err("Invalid JSON: expected comma or closing brace".to_string());
}
}
}
return Err("Invalid JSON".to_string());
}
We still need to implement the parse_string
function, which will take the Peekable
data stucture as an argument
and try to consume a JSON string from the current state of the structure.
We can implement the function in the following steps:
- consume the opening quote
if let Some('"') = chars.peek() {
chars.next();
} else {
return false;
}
- Consume characters until finished OR until seeing the closing quote
while let Some(c) = chars.peek() {
match c {
'"' => {
chars.next();
// We have seen the closing '"', we can return true!
return true;
},
// Consume character
_ => chars.next();
}
}
Here is the full implementation of the parse_string
function:
fn parse_string(chars: &mut Peekable<Chars>) -> Result<bool, String> {
// Check that string starts with a '"'
if let Some('"') = chars.peek() {
chars.next();
} else {
return Err("Invalid JSON: Expected opening quote".to_string());
}
// Consume characters until finished OR until seeing the closing '"'
while let Some(c) = chars.peek() {
match c {
'"' => {
chars.next();
// We have seen the closing '"', we can return true!
return Ok(true);
},
// Consume character
_ => chars.next();
}
}
// If we reach here, that means a opening (") was never closed
return Err("Invalid JSON: expected string to close".to_string());
}
Note: this function assumes that your strings don't escape any characters. Implementing support for escaping characters is left is a challenge for the reader.
At this point we have implemented the parse_object
and parse_string
functions. We can now test the program with the valid and invalid test cases we created earlier. Now, we can run the program as such: cargo run -- {path/to/test_file}
. After running with the valid test case, running echo $? should produce 0 and it should produce 1 after running the program with the invalid test case. For the invalid test case, you should see Invalid JSON: failed to parse value
in stdout.
Here is the entire soure code for this step:
fn main() {
... // Same as part 1
}
fn parse_string(chars: &mut Peekable<Chars>) -> Result<bool, String> {
// Check that string starts with a '"'
if let Some('"') = chars.peek() {
chars.next();
} else {
return Err("Invalid JSON: Expected opening quote".to_string());
}
// Consume characters until finished OR until seeing the closing '"'
while let Some(c) = chars.peek() {
match c {
'"' => {
chars.next();
// We have seen the closing '"', we can return true!
return Ok(true);
},
// Consume character
_ => chars.next();
}
}
// If we reach here, that means a opening (") was never closed
return Err("Invalid JSON: expected string to close".to_string());
}
fn parse_object(chars: &mut Peekable<Chars>) -> Result<bool, String> {
if let Some('{') = chars.peek() {
chars.next();
} else {
return Err("Invalid JSON: expected objected to start with '{'".to_string());
}
if let Some('}') = chars.peek() {
chars.next();
return Ok(true);
}
while let Some(_) = chars.peek() {
if parse_string(chars).is_err() {
return Err("Invalid JSON: failed to parse key".to_string());
}
if let Some(':') = chars.next() {
if parse_string(chars).is_err() {
return Err("Invalid JSON: failed to parse value".to_string());
}
} else {
// Didn't see a colon, invalid JSON
return Err("Invalid JSON: no colon after key".to_string());
}
match chars.next() {
Some(',') => ()
Some('}') => return Ok(true),
_ => {
return Err("Invalid JSON: expected comma or closing brace".to_string());
}
}
}
return Err("Invalid JSON".to_string());
}
And thats it for step 2! We now have a program that can successfully identify an empty object. Simple enough! Checkout out step 3.