Metadata

Metadata provides specific information about the content and style for your pages. Meta Tags are used to define metadata and are placed in the head section of HTML or XHTML documents and are not visible to people visiting your site. The purpose of some of these tags is to give you a degree of control over how your web pages are described and ranked by search engines, and to prevent other pages from being indexed at all. Such as:

  • Title Tag. Ensure that you include the most important keywords when writing a page title as it will appear in a user's browser window when visiting a page. It will also be used as the page name when someone adds it to their 'Favourites' or 'Bookmarks' lists. The text you use in the title tag is also important for determining how a search engine may rank your web page. All major crawlers will use the text from this tag as the title of your page in search engine listings.
  • Description Tag. This tag will give you some control over the description of content of your pages in search engine listings, although not all search engines use this tag. For instance, Google ignores it and automatically generates its own page description; others may use it partially.
  • Keywords Tag. Most search engines now ignore this tag, but if you do use it, its best to place your most important terms at the top and to use words and phrases that appear within the page they relate to. It's now well known that Google looks at words on a page when it is indexing, as a result, it's best to include words and phrases within the text on your pages that your target audience would use, especially within the first few paragraphs and headings.
  • Meta Robots Tag. This tag allows you to specify that a particular page should not be indexed by a search engine, i.e., pages that you only want to be viwed by staff.
Below is a complete example of some metadata, taken from the 2005 Graduate Studies Prospectus site, with comments:

This tells the browser where to find the 'document type definition' - the definition of the version of HTML code used. In this case, XHTML 1.0 Strict:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

The language in use - 'en'. The server might set this:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

The start of the 'head' of the page, and the character set in use:

<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-15" />
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="expires" content="Wed, 11 Aug 2006 00:00:01 GMT" />

Links to external stylesheets in use:

Embedded CSS is not recommended. If you link to an external stylesheet file, the user only has to download it once. It also keeps your page code simpler for maintenance.

<link rel="stylesheet" type="text/css" xhref="/common/styles/safev2.css" mce_href="/common/styles/safev2.css" media="screen" />
<link rel="stylesheet" type="text/css" xhref="/postgraduate/local/styles/local.css" mce_href="/postgraduate/local/styles/local.css" media="screen" />
<link rel="stylesheet" type="text/css" xhref="/common/styles/cssp_print.css" mce_href="/common/styles/cssp_print.css" media="print" />

Link to copyright:

<link rel="copyright" xhref="/copyright/" mce_href="/copyright/" />

Administrative metadata:

<meta name="build" content="University of Oxford Admin Template v6.5" />
<meta name="generator" content="Oxford University Central Administration" />

Keywords, and description sentence used by some crawlers:

<meta name="keywords" content="graduate, studies, prospectus, oxford, university, study, postgraduate,general, information" />
<meta name="description" content="Graduate Studies Prospectus" />
<meta name="author" content="Oxford University Public Relations" />
<meta name="publisher" content="University of Oxford" />
<meta name="MSSmartTagsPreventParsing" content="True" />
<meta http-equiv="IMAGETOOLBAR" content="No" />
<meta name="doc-rights" content="public" />
<meta name="doc-class" content="Static" />

Instructions to the 'robots' which index pages for search engines

<meta name="ROBOTS" content="INDEX,FOLLOW" />
<meta name="revisit-after" content="30 days" />
<meta name="rating" content="Mature" />

Some 'Dublin Core' information. This has not been widely adopted by any means, but it does no harm to add some.

<meta name="DC.title" content="Graduate Studies Prospectus" />
<meta name="DC.subject" xml:lang="en-GB" content="Oxford University Graduate Studies Prospectus, for entry in 2006-7. Courses, research areas, facilities, accommodation, student funding, how to apply and links to department information." />
<meta name="DC.Date.modified" content="date last modified" />

The title is critical:

<title>Graduate Studies Prospectus : Oxford University Graduate Studies Prospectus 2006/07</title>

The end of the 'head'

</head>