Several HTML elements, most notably the A
element, may contain an attribute which takes a URL as value.
URLs, Uniform Resource Locators, are addresses of Web documents.
More generally, URLs can be used on the Web to refer to
"objects" on the Web or in other information systems. The general syntax of absolute URLs is the following: scheme where Warning: Although many browsers allow you to
omit the part Actually, this pattern is mainly for Web documents, ie An It is safest to enclose URLs in quotes
when writing them as attribute values in HTML. For an overview of URLs, see W3C
material on addressing. As regards to the technical
specifications of the syntax of URLs, see RFC 1738
(absolute URLs) and RFC 1808
(relative URLs). In particular, the specifications say
that within a URL only a limited set of characters can be
used as such: Other characters must be encoded. (The characters When a URL occurs as an attribute value in HTML, there is another
complication caused by the & character which
may have special use in query form
submissions. In principle, that character should be escaped as & or as &
(there is a
footnote in the HTML 2.0
specification about this) and browsers should process it so that
the actual URL passed to the processing CGI
script has that notation replaced by plain & character.
(Notice that it must not be encoded. This is a confusing
issue, and CGI scripts should really be written so that semicolon
; and not ampersand & is used as field separator.)
URL
://
host:
port/
path/
filename
http
a Web document (to be accessed using
Hypertext Transfer Protocol, HTTP)
ftp
a
resource to be retrieved using FTP (File
Transfer Protocol), usually a file in a so-called
FTP server,
file
a file on a particular computer; a
file
URL is hardly useful on the Web
gopher
a file in a Gopher server
mailto
electronic mail address
news
a newsgroup or an article in Usenet news
telnet
for starting an interactive session via the
Telnet protocol (which is part of TCP/IP)
www.hut.fi
(or sometimes a numerical TCP/IP address); notice that
typically, but not necessarily, Web servers have domain
names starting with www
:
port http://
when specifying the URL of a
document to be visited, you must not omit it in when writing a
normal URL into an HTML document. (Otherwise browsers will try to
interpret it as a relative URL.) http
URLs. For other URLs, simplifications and special interpretations
are applied. For example, a mailto
URL is just of
the form mailto
:address where address
is a normal Internet E-mail address like [email protected]
(as specified in RFC
822). Please notice that appending anything to the E-mail
address in a mailto
URL is nonstandard and may
result in lost mail without anyone noticing! (See also the
discussion of mailto:
URLs in the description of the
A element.) http
URL can also be a fragment identifier
which consists of an absolute URL, the # sign and a name (which refers to a location within the
document specified by the absolute URL). See the description of
the A element for more information.
A
to Z
,
a
to z
, 0
to 9
)
$-_.+!*'(),
;/?:@=&#
provided that
they are used in the special meaning reserved
for them in the RFCs mentioned above.
;/?:@=&#
must also be encoded, if they are not used in the special
meaning.) This encoding (which is defined by URL specifications,
not HTML specifications) consists of using the percent sign
followed by two hexadecimal digits, presenting the code position.
For example, tilde (~
) should be presented as %7E
and space as %20
. (Violating the rules causes
problems much more likely in the latter case than in the former.)