Lecture 2 Structure: HTML Markup

Topics for today

So what is structure, and what is presentation? Take a look at these two headings. They look identical. But they do not have the same *meaning*. One is a top-level heading, and the other is regular text, styled to look big and bold. What difference does it make?
  • Same visual presentation (though this may differ per browser), but different meaning.

Why do semantics matter?

Not everyone browsing the Web is sighted

Screen readers depend on semantically correct HTML

Not everyone browsing the Web
is able to use a mouse

Keyboard navigation depends on semantically correct HTML


  • Focus determines where keyboard events go in the page at any given moment. It is widely understood to be the currently active element.
  • The Tab key moves focus to the next element in the tabbing order. Shift + Tab does the opposite, focusing the last element in the tabbed order.
    • Try it yourself: Click on the page on the right and press Tab a few times.
  • By default it's rendered as a blue fuzzy outline or a dotted solid one, but the styling can change with CSS. However, it is very important that there is styling for it, as keyboard users depend on it.
  • Users with various disabilities use the keyboard for navigation, but also power users often prefer the keyboard, as they find it more efficient.
  • Some resources to read more about focus:

Common focusable elements


<input type="text" autofocus>
  • Use autofocus when there's a clear starting point.
  • Which dimension of usability does this help?

Not everyone browsing the Web is human

Software (search engines, social media sharing etc) depend on semantically correct HTML

Not every browser is visual

Machine readable content is future-proof

Semantic HTML is easier to maintain

			­		</div>

			­		</ul>

			­		</div>

			­		</article>

			­		</div>

			­		</article>
Semantic HTML is also easier to maintain. It's easier to know where you are in the code when there's more variability in the elements used, compared to when everything is a div or a span. Remember, your code is also an interface; for yourself and other people. Which dimensions of usability does this help?

article or li?

Sometimes which element is most semantically appropriate is not quite straightforward, and there may be multiple correct answers. Here is an example: a list of products. Is a `<li>` more appropriate (since this is a *list*), or an `<article>` (since each product is an article (item))?
Let’s inspect what Amazon actually uses. A plain meaningless `<div>`?! Why is that? Many potential reasons: - Frameworks typically don’t understand meaning, and generate divs. - Could be older code (`<article>` and friends is relatively new)

Tables & meaning

In the early days, when CSS was less powerful, this kind of thing used to be very common, (ab)using a table to put things next to each other. About 15 years ago, even entire websites were laid out with tables, one cell for the header, one cell for the sidebar, one cell for the content etc. What is wrong with it?

Tabular data

The `<table>` HTML element represents tabular data — that is, information presented in a two-dimensional table comprised of rows and columns of cells containing data. Take a look at this browser compatibility *table*, that is certainly tabular data! What about this summary element? Is that tabular data?

Semantic HTML

Document Object Model (DOM)

Trees (data structure)

You may have seen trees as a data structure before. Relationships between nodes (parent, child, sibling, ancestor, descendant) are borrowed from family trees. It is essential to understand tree relationships when using Web Platform technologies, as they are used in a variety of things.

DOM hierarchy

<li> HTML elements are <strong>objects</strong>, with a hierarchy called <em><abbr title="Document Object Model">DOM</abbr> tree</em> </li>
HTML start and end tags are not on/off instructions, but delimiters for the boundaries of HTML elements. Therefore, there is also a containment relationship between HTML elements that start *inside* others. These containment relationships create a tree, that is called *the DOM tree*.

Remember this?

What we wrote

				<!DOCTYPE html>
				<title>Hello world</title>
				<p>Hello <em>world</em> 👋

What the browser generated

				<!DOCTYPE html>
				<head><title>Hello world</title></head>
				<body><p>Hello <em>world</em> 👋</p></body>

Our DOM Tree

Document node Element nodes Text nodes
  • The browser’s internal model of our HTML is called the DOM Tree
  • It’s a hierarchy of objects of different types, such as:
    • Document: This is the root node, and does not correspond to any HTML element.
    • HTMLElement: Every HTML element, such as html, body, or em is of this type. Usually, they merely inherit from HTMLElement, and are an instance of a more specific type such as HTMLHtmlElement, HTMLBodyElement and so on.
    • Text: Text nodes, such as "Hello ", "world", and "!" in our example. These never contain any other element, they are always leaves.
    • Comment: HTML comments (<!-- like this -->) are represented by objects of this type.
  • This hierarchy of objects is crucial for CSS styling, as we will see in a couple lectures.
  • We can interact with, and manipulate the DOM tree via JavaScript!

Same content hierarchy, different DOM tree hierarchy

The structure of heading elements and sectioning elements creates the *content hiearchy* (also called *document outline*). This has to do with the meaning that is created by these elements. E.g. a `<h1>` inside a `<section>` has a different place in the document outline than a `<h1>` directly inside `<body>`. The DOM tree is a different hiearchy, and has to do with the types and relationships of elements on the page. An `<h1>` creates the same type of object in the DOM, regardless of where it appears.

CSS selectors & the DOM

A large part of CSS selectors has to do with selecting elements based on their relationships with other elements in the DOM tree. Let's explore some of these. In the first lecture, we saw [*element selectors* (also called *type selectors*)](https://developer.mozilla.org/en-US/docs/Web/CSS/Type_selectors) that select all elements of a given type. * We can use whitespace to target elements based on a *descendant* relationship (*descendant combinator*) * We can use a `>` symbol to target direct children (*child combinator*) * We can use a `~` symbol to target siblings that come after (*sibling combinator*) * We can use a `+` symbol to target the next sibling (*adjacent sibling combinator*) * Things after colons, like the `:nth-child()` here are called a *pseudo-classes*, and filter the selector they're attached to further. This particular pseudo-class filters selectors based on their index among their siblings. You can even use entire patterns like e.g. `:nth-child(3n+2)` which matches the 2nd, 5th, 7th etc children. * We will take a proper look at selectors later. The takeaway here is to show you how strongly related the DOM is with selectors.

The DOM Tree


Buttons & Forms

  • The button element creates buttons.
  • These buttons don’t do anything by themselves, but you can make them useful via JavaScript or form elements.
  • The action attribute of form elements controls which file will receive the submission. By default it's the current file.
  • What’s submitted? Only elements that are both enabled and have a name attribute.
  • target="_blank" works here too, to submit the form in a new tab

Use appropriate controls

<input type="text" placeholder="YYYY-MM-DD">

				<input type="number" name="day">
				<select name="month">
				<input type="text" name="year">
<input type="date" />

Sometimes the differences are subtle

<input type="text">
<input type="email">
Also, the fact that these differences are subtle now, doesn't mean they will remain subtle. By using the right input type, you opt in to any future usability improvements.

Multiple choices: usability tradeoffs

				<label><input type="radio" name="letter"> A</label>
				<label><input type="radio" name="letter"> B</label>

				<select name="letter">

				<input list="letters" />
				<datalist id="letters">

Depending on the number of options, different form controls are more appropriate for usability:

  • For < 5 options, prefer radios if there is enough space, as they are the most efficient. Users will be able immediately scan how many options they have and what each of those options are, without clicking (or typing) anything to reveal this information, and make their selection with a single click.
  • Between 6-20 options, <select> menus are ideal
  • For longer than 20 options, search time for <select> menus becomes longer than typing the first few characters. In those cases, use <input> with <datalist for autocomplete)

Two choices: Checkbox or radio?

When dealing with two options, a good rule of thumb is to use a checkbox if the answer is a simple yes/no, and radio buttons when the two options are alternatives.

The order of options matters too!

“This is not bad design, this is very good design that just happens to be evil” --> In UI design, this is called a *dark pattern*. We will talk a bit more about them later in the semester. [Watch the entire excellent talk here](https://vimeo.com/165123760).

Web apps with HTML?


				<h1>My tasks</h1>
				<p>0 done of 1 total</p>

						<input type="checkbox" />
						<span>Do stuff</span>

			<body mv-app="todo" mv-storage="local">
				<h1>My tasks</h1>
				<p>[count(done)] done of [count(task)] total</p>

					<li property="task" mv-multiple>
						<input property="done" type="checkbox" />
						<span property="taskTitle">Do stuff</span>

			<body mv-app="todo" mv-storage="local">
				<h1>My tasks</h1>
				<p>[count(done)] done of [count(task)] total</p>

					<li property="task" mv-multiple>
						<input property="done" type="checkbox" />
						<span property="taskTitle">Do stuff</span>
				<button mv-action="delete(task where done)">
					Clear completed

Try Mavo out

			<link rel="stylesheet" href="https://get.mavo.io/mavo.css"/>
			<script src="https://get.mavo.io/mavo.js"></script>

or just visit play.mavo.io and start coding!

A brief, messy history of HTML

SGML: A meta-language to generate markup languages, with custom syntax, described by a DTD.

Tim Berners-Lee adapts IBM Starter Set, a markup language written in SGML, adds <a> and calls it HTML 1. A specification never existed.

HTML 1 elements

<title> <nextid> <a> <isindex> <plaintext> <listing> <p> <h1> <h2> <h3> <h4> <h5> <h6> <address> <dl> <dt> <dd> <ul> <li>

IETF publishes a specification (RFC) for HTML 2, using SGML properly, with a DTD. No software actually used the DTD to display HTML, only validators. In fact, most HTML documents were not valid SGML.

HTML 3.0 adds math, figures, tables, stylesheets. IETF rejects as “too ambitious” and closes its HTML WG.

HTML moves to W3C, founded by Tim Berners-Lee. The far less ambitious HTML 3.2 is published standardizing what was already implemented. Adds <style> and <script> reserved for future use.

HTML 4 deprecates presentational HTML, adds frames, extends forms.

XHTML 1 is published. Like HTML4, but with XML syntax. Separates parsing from semantics and makes HTML extensible. Draconian error handling (in theory)

XHTML 2: Fresh start, theoretically pure vocabulary with little concern for backwards compatibility. Browsers refuse to implement, lose faith in W3C, found WHATWG

HTML 5 “paves the cowpaths”, standardizes long supported features, adds video, audio, new input types, sectioning, generated graphics, <svg>, <math>, figures, custom data-* attributes. Breaks compatibility with SGML, defines its own compatibility-oriented parsing.

W3C and WHATWG continue to work on HTML 5 independently, often diverging. Browsers follow the WHATWG spec. On 28 May 2019, the W3C announces that WHATWG would be the sole publisher of the HTML and DOM standards.