As both a web framework and a web server, Atlas needs to parse HTTP requests.
My approach to this project is to build just enough of what I need before I can move onto the next piece of the project. I’m trying to reach a stage where the framework is actually useful for someone, even if there are lots of sharp edges and unhandled corner cases.
What is a “just enough” HTTP parser at such an early stage? When I thought through that question, I found that what my parser really needs to implement from the start is the request line of an HTTP message.
The HTTP Message Syntax and Routing specification includes an Augmented Backus-Naur Form (ABNF) of the HTTP message. Maybe you’ve never encountered ABNF before. ABNF is a technique for describing the structure of something. The form can describe things like programming languages and message formats. ABNF is a structured and consistent way of breaking a high level concept (such as an HTTP message) into its constituent pieces.
I’ll use the ABNF to show you how I picked the part that matters for Atlas currently. Here’s the ABNF of the HTTP message, taken directly from the specification:
HTTP-message = start-line
*( header-field CRLF )
CRLF
[ message-body ]
This says that a message must
- include a
start-line
- include zero or more
header-field
lines that end with a CRLF (i.e., a “carriage return” character and “line feed” character, which is"\r\n"
in Lua and many other languages) - include a CRLF by itself as a separator or terminator
- optionally include a
message-body
The job of the specification is to describe
in detail
what all those pieces mean.
Let’s keep digging.
What’s in the start-line
?
start-line = request-line / status-line
In this format,
the slash (/
) means “OR.”
Since HTTP-message
describes both the request format
and the response format,
the start-line
is where there is differentiation.
I’m building an HTTP parser
that needs to route incoming requests,
so I care about the request-line
currently.
request-line = method SP request-target SP HTTP-version CRLF
This is starting the look like the pieces that are required. The routing layer that I built for Atlas is concerned with two kinds of information:
- The HTTP request method
- The HTTP request path
“Method” and “path” are part of the language used
by the ASGI specification.
method
is the proper source
to match the ASGI “method”
and maps to HTTP method names
like GET
, POST
, and so on.
For “path,” we have to keep going.
request-target = origin-form
/ absolute-form
/ authority-form
/ asterisk-form
This is when we start seeing the different kinds
of HTTP scenarios pop up.
I’ll save you the trouble of reading this part
of the spec
and tell you that the origin-form
is what I was after.
origin-form = absolute-path [ "?" query ]
So,
that’s all the spec pieces.
What is “just enough” to get through
this stage?
I built the parser to read the method and path
from the request-line
.
- Method comes from the
method
in therequest-line
. - Path comes from the
absolute-path
in theorigin-form
.
In my research, I found multiple ways to handle parsing. A more advanced approach seems to use a state machine to iterate over the incoming network bytes to determine the parts of this request. This approach probably involves the least amount of overall work and buffering because these state machines can look at the bytes once to determine what they are for.
I’m using a less elegant approach to start.
With my approach,
I’m using the pattern matching capabilities of Lua
to look for a valid request-line
.
Here are the relevant snippets.
local REQUEST_LINE_PATTERN = "^(%u+) ([^ ]+) HTTP/([%d.]+)\r\n"
function Parser.parse(_, data) -- self, data
local meta = {type = "http"}
local method, target, version = string.match(data, REQUEST_LINE_PATTERN)
-- ...
end
I didn’t hightlight this early,
but the request line requires 2 space characters (SP
)
between the different parts.
Spaces are not allowed
in the other pieces of request-line
,
so that makes the Lua pattern easier to construct.
Somewhat hilariously, this is pretty much where I can stop on this parser for now. I did some extra error handling that you can check out, but this is the very basics of request handling. For the time being, I’m ignoring query strings, encoding, and probably a ton of other details that I’m not even aware of.
With a super basic HTTP parser in Atlas, what’s next?
I think it’s time for me to return to the template system in my web framework. This is where I started with Atlas over six months ago. Like much of the project, the templates are functional, but extremely limited.
The next steps with templates are twofold:
- Create a
render
function that wraps up all the details of finding the template file, rendering the output, and packaging the output into an HTTPResponse
object. - Extend the template system
to display context data
inside of template expressions
(e.g.,
{{ foo.bar }}
).
Once all that is done, I think I might be ready to venture into the area that I have feared to tread: Object Relational Mappers (ORM). I have no idea how to build an ORM and my raw SQL skills are definitely not as strong as I would like.
One of the excellent things about this project is that’s it’s forcing me to grow my knowledge. It’s extremely unlikely that I would have read the HTTP specs without a project to drive that. Atlas is also going to push me into obtaining a deeper understanding of data modeling and SQL. That’s an exciting part of this effort.
Thanks for reading! Have questions? Let me know on X at @mblayman.