In A List Apart’s series “From URL to Interactive“, Blogger Travis Leithead describes in his article “Tags to DOM” how HTML documents are parsed and converted into the Document Object Model (DOM). Through the course of his post, Leithead explains encoding, pre-parsing, tokenization, and tree construction.
The first step to parsing an HTML document is encoding. This stage is where the parser must sort through each part of the input and sent from the server. Since all information computers handle is binary, it is the parsers responsibility to figure out how to interpret the binary stream into readable data.
After all the data has been encoded, the next stage is pre-parsing. During this step, the parser looks to gather specific resources that might be costly to access later. For example, the parser might select an image file from a URL tagged with a “src” attribute.
Next, the parser moves on to tokenization. This process is when all the markup is split up into discrete tokens that can be organized into a discernable hierarchy. The parser reads through the text document and decides which tokens to declare based on the HTML syntax.
Once all the document has been tokenized, each token is organized into what is called the Document Object Model, or DOM. The DOM is a standardized tree structure that defines the structure of the web page, organizing it into sections with a parent-child relationship. The root of the tree is the #document/<html> tag, and its children are <head> and <body>, the body may contain multiple children, and this organization contextualizes the data into a readable layout. The DOM also contains information about the state of its objects, so this allows for a degree of interaction that make complex programs.
I would highly recommend this article and this whole series to any budding developer such as myself, as understanding the entire process of how programs transform from code to an interactive website is integral knowledge. And the authors at A List Apart do a great job being both thorough and concise in their explanations. I definitely feel more knowledgeable after having read this article and found it valuable.
From the blog CS@Worcester – Bit by Bit by rdentremont58 and used with permission of the author. All other rights reserved by the author.