<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

<!-- $Id: draft-liman-tld-names.xml,v 1.40 2011/04/12 08:20:42 liman Exp $ -->

<?rfc compact="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc strict="yes"?>
<!-- <?rfc-ext include-references-in-index="yes" ?> -->
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>


<!-- <rfc ipr="full2026" docName="draft-liman-tld-names-01"> -->
<!-- <rfc ipr="trust200902" -->
<rfc ipr="trust200902"
     docName="draft-liman-tld-names-06"
     updates="1123"
     category="std">

  <front>
    <title>Top Level Domain Name Specification</title>

    <author initials="L-J." surname="Liman" fullname="Lars-Johan Liman">
      <organization abbrev="Netnod">
        Netnod Internet Exchange
      </organization>
      <address>
       <postal>
           <street>Box 30194</street>
           <city>SE-104 25 Stockholm</city>
<!--       <code>SE-118 47</code>    # This doesn't work for Swedes -->
           <country>Sweden</country>
       </postal>
        <email>liman@netnod.se</email>
        <uri>http://www.netnod.se/</uri>
      </address>
    </author>

    <author initials="J." surname="Abley" fullname="Joe Abley">
      <organization>ICANN</organization>
      <address>
        <postal>
          <street>4676 Admiralty Way</street>
          <street>Suite 330</street>
          <city>Marina del Rey</city>
          <code>90292</code>
          <country>USA</country>
        </postal>
        <email>joe.abley@icann.org</email>
        <uri>http://www.icann.org/</uri>
      </address>
    </author>

    <date day="29" month="November" year="2011"/>

    <workgroup>Individual Submission</workgroup>
    
    <abstract>
      <t>The syntax for allowed Top-Level Domain (TLD) labels in
	the Domain Name System (DNS) is not clearly applicable to
	the encoding of Internationalised Domain Names (IDNs) as
	TLDs.</t>

      <t>This document provides a concise specification of TLD label
	syntax based on existing syntax documentation, extended
	minimally to accommodate IDNs.</t>

      <t>This document updates RFC1123.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>The syntax of TLD labels ("TLD DNS-Labels", as defined in
	<xref target="definitions"/>) is specified 
	in <xref target="RFC1123"/>, where such labels are asserted
	to be "alphabetic" within a section of that document entitled
	"DISCUSSION". This can be interpreted as requiring
	that the hyphen character ("-") and numeric digits be
	excluded from TLD DNS-Labels.  Such a restriction would not
	accommodate the US-ASCII encoding of Internationalised
	Domain Names (IDNs), as specified in <xref
	target="RFC5890"/>.  A more detailed discussion
	of the existing specifications can be found in <xref
	target="background"/>.</t>

      <t>This document extends the syntax of allowable TLD DNS-Labels
	to support IDNs, but places some restrictions on the choice
	of IDN labels. These restrictions are intended to be
	consistent with the existing specification for US-ASCII TLD
	DNS-Labels.  See <xref target="specification"/> for the
	updated specification.</t>

      <t>This document focuses narrowly on the issue of allowable
	DNS-Labels in TLDs and does not (and is not intended to)
	make any other changes or clarifications to existing domain
	name syntax rules.</t>

      <t>It is carefully noted that the specification in this
	document is not the only factor in choosing suitable TLD
	DNS-Labels, and that many considerations external to the
	IETF are included in that wider policy.  See <xref
	target="policy"/> for more discussion of policy considerations.</t>
    </section>

    <section title="Definitions" anchor="definitions">
      <t>The term DNS-Label is used in this document to have precisely
	the same meaning as the term "label", as introduced in <xref
	target="RFC1034"/>, section 3.1.  A DNS-Label denotes one
	node in a DNS tree. A DNS-Label is zero to 63 octets in
	length.  The term "DNS-Label" refers exclusively to the
	"wire format" of the label, and not to any presentation
	format of the label.</t>

      <t>A Top-Level Domain (TLD) DNS-Label is the right-most
	("highest-level") DNS-Label in a fully-qualified domain
	name.</t>

      <t>The terms A-Label and U-Label are used in this document
        as defined in <xref target="RFC5890"/>.</t>
    </section>

    <section title="Background" anchor="background">
      <t><xref target="RFC0952"/> defines a host name as follows:

	<list style="empty">
	  <t>'A "name" ... is a text string up to 24 characters
	    drawn from the alphabet (A-Z), digits (0-9), minus sign
	    (-), and period (.).  Note that periods are only allowed
	    when they serve to delimit components of "domain style
	    names".  (See RFC-921, "Domain Name System Implementation
	    Schedule", for background).  No blank or space characters
	    are permitted as part of a name.  No distinction is
	    made between upper and lower case.  The first character
	    must be an alpha character.  The last character must
	    not be a minus sign or period.' [Unnumbered section
	    titled "ASSUMPTIONS", first paragraph]</t>
	</list>

	<xref target="RFC1123"/> reaffirms this definition, but
	makes one change to the syntax:

	<list style="empty">
	  <t>'The syntax of a legal Internet host name was specified
	    in RFC-952 [DNS:4].  One aspect of host name syntax is
	    hereby changed: the restriction on the first character
	    is relaxed to allow either a letter or a digit.  Host
	    software MUST support this more liberal syntax.' [Section
	    2.1]</t>
	</list>

	In addition, the DISCUSSION section of Section 2.1 says:

	<list style="empty">
	  <t>'However, a valid host name can never have the
	    dotted-decimal form #.#.#.#, since at least the
	    highest-level component label will be alphabetic.'
	    [Section 2.1]</t>
	</list>

	Some implementers may have understood the above phrase 'will
	be alphabetic' to be a protocol restriction.</t>

      <t>Neither <xref target="RFC0952"/> nor <xref target="RFC1123"/>
	explicitly states the reasons for these restrictions.  It
	might be supposed that human factors were a consideration;
	<xref target="RFC1123"/> appears to suggest that one of the
	reasons was to prevent confusion between dotted-decimal
	IPv4 addresses and host domain names.  In any case, it is
	reasonable to believe that the restrictions have been assumed
	in some deployed software, and that changes to the rules should
	be undertaken with caution.</t>

      <t>The Internationalised Domain Names in Applications 2008
        specification (IDNA2008)
        <xref target="RFC5891"/>
        <xref target="RFC5892"/>
	provides a protocol for encoding Unicode strings in DNS-Labels.
	The Unicode string used by applications is known as a
	U-Label; its corresponding encoding in the DNS is known as
	an A-Label.  The terms A-Label and U-Label are used in this
	document as defined in <xref target="RFC5890"/>.
	Valid A-Labels always contain non-alphabetic characters.</t>

      <t>In order to accommodate the wish to express TLD names in
	scripts other than Latin (or rather, the US-ASCII subset
	of Latin), it is necessary to allow non-alphabetic characters
	in the corresponding TLD DNS-Labels.  To minimize changes, the
	U-label form of a TLD name is restricted in ways functionally
	compatible with the restrictions (from <xref target="RFC0952"/>
	and <xref target="RFC1123"/>) on US-ASCII TLD names, by
	applying rules analogous to those already imposed on US-ASCII
	TLD DNS-Labels to TLD U-labels.</t>

      <t>However, deployed software that checks DNS top-level labels
	for conformance with an alphabetic restriction will not
	recognize such corresponding A-Labels (i.e., U-labels
	represented in their US-ASCII form).</t>
    </section>

    <section title="TLD Label Syntax Specification" anchor="specification">
      <t>This document relaxes the existing specification to allow
	TLD DNS-Labels to be well-formed A-Labels, but places
	restrictions on their corresponding U-Labels. That is, not
	every well-formed A-Label is a valid TLD DNS-Label.</t>

      <figure>
	<preamble>The ABNF expression that matches a valid TLD
	  DNS-Label is as follows:</preamble>

        <artwork type="abnf">
          tld-dns-label = traditional-tld-label / idn-label

          traditional-tld-label = 1*63(ALPHA)

          idn-label = Restricted-A-Label

          ALPHA    = %x41-5A / %x61-7A   ; A-Z / a-z
        </artwork>
      </figure>

      <t>A Restricted-A-Label is a DNS-Label which satisfies all
        the following conditions:

	<list style="numbers">
	  <t>the DNS-Label is a valid A-Label according to <xref
	    target="RFC5890"/>;</t>

	  <t>the derived property value of all code points, as defined
	    by <xref target="RFC5890"/>, is PVALID;</t>

	  <t>the general category of all code points, is one of {
	    Ll, Lo, Lm, Mn, Mc }.</t>
	</list>

	This new specification reflects current practice in
	registration of TLD names by the IANA, extended to accommodate
	IDNs.</t>
    </section>

    <section title="Policy Considerations" anchor="policy">
      <t>This document provides a technical specification that
	limits the set of TLD DNS-Labels that are available for
	assignment; it does not aim to encapsulate the full policy
	framework within which TLD names are chosen.</t>

      <t>At the time of writing, the policy under which TLD names
	are chosen is developed and maintained by ICANN in consultation
	with a wide base of stakeholders.  As the Internet continues
	to grow to serve new user communities, applications and
	services, it is to be expected that the corresponding policy
	will be changed accordingly.</t>
    </section>

    <section title="IANA Considerations">
      <t>While this document makes no requests of the IANA, management
	of the root zone is an IANA function. This document expands
	the set of strings permitted for delegation from the root
	zone, and hence establishes new limits for the corresponding
	IANA policy.</t>
    </section>

    <section title="Security Considerations">
      <t>This document is believed to have limited security
        implications.</t>

      <t>General discussion about the security effects of
	internationalized labels can be found in <xref
	target="RFC5890"/>, section 4.  Those
	considerations apply equally to TLD labels.</t>

      <t>The creation of new TLDs has the potential to conflict
	with software which (for example) predates and correspondingly
	does not accommodate new TLD names.  Such software problems
	might in turn lead to security vulnerabilities, e.g. in the
	case where a DNS name specified by a user is truncated or
	otherwise misinterpreted, causing an application to interact
	with a different remote host from that which the user
	intended.  It should be noted that this is not a new
	phenomenon, and has been observed following the creation
	of new (US-ASCII) TLD names prior to the publication of
	this document.</t>

<!--
  -- I'm not sure this is defensible, as-stated, so I've commented
  -- it out. -jabley
  --
      <t>In addition, the probability of a truncated TLD mapping
	into another valid TLD is small given the small number of
	TLDs that are actually in use today.</t>
  -->

      <t>The issue that some Unicode characters can be confused
	with each other is discussed at length in the Security
	Considerations section of <xref
	target="RFC5890"/>.</t>
    </section>

    <section title="Acknowledgements">
      <t>Tina Dam, Patrik Faltstrom, John Klensin, Thomas Narten
        and Andrew Sullivan contributed text to this document, and
        their contributions are hereby acknowledged.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <reference anchor='RFC1123'>
	<front>
	  <title>Requirements for Internet Hosts - Application and
	    Support </title>
	  <author initials='R.' surname='Braden' fullname='Robert
	    Braden'>
	    <organization>University of Southern California (USC),
	      Information Sciences Institute</organization>
	  </author>
          <date year='1989' month='October'/>
	</front>
        <seriesInfo name='STD' value='3'/>
	<seriesInfo name='RFC' value='1123'/>
      </reference>

      <reference anchor='RFC5890'>
	<front>
	  <title>Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework</title>
	  <author initials='J.' surname='Klensin' fullname='J. Klensin'>
	    <organization /></author>
	  <date year='2010' month='August' />
	</front>
	<seriesInfo name='RFC' value='5890' />
	<format type='TXT' octets='54245' target='http://www.rfc-editor.org/rfc/rfc5890.txt' />
      </reference>


      <reference anchor='RFC5891'>
	<front>
	  <title>Internationalized Domain Names in Applications (IDNA): Protocol</title>
	  <author initials='J.' surname='Klensin' fullname='J. Klensin'>
	    <organization /></author>
	  <date year='2010' month='August' />
	</front>
	<seriesInfo name='RFC' value='5891' />
	<format type='TXT' octets='38105' target='http://www.rfc-editor.org/rfc/rfc5891.txt' />
      </reference>

      <reference anchor='RFC5892'>
	<front>
	  <title>The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)</title>
	  <author initials='P.' surname='Faltstrom' fullname='P. Faltstrom'>
	    <organization /></author>
	  <date year='2010' month='August' />
	</front>
	<seriesInfo name='RFC' value='5892' />
	<format type='TXT' octets='187370' target='http://www.rfc-editor.org/rfc/rfc5892.txt' />
      </reference>
    </references>

    <references title="Informative References">
      <reference anchor='RFC0952'>
	<front>
	  <title>DoD Internet host table specification</title>
	  <author initials='K.' surname='Harrenstien' fullname='K.
	    Harrenstien'>
	    <organization>SRI International</organization>
	  </author>
	  <author initials='M.' surname='Stahl' fullname='M. Stahl'>
	    <organization>SRI International</organization>
	  </author>
	  <author initials='E.' surname='Feinler' fullname='E. Feinler'>
	    <organization>SRI International</organization>
	  </author>
	  <date year='1985' day='1' month='October' />
	</front>
	<seriesInfo name='RFC' value='952' />
      </reference>

      <reference anchor='RFC1034'>
        <front>
          <title abbrev='Domain Concepts and Facilities'>Domain
            names - concepts and facilities</title>
          <author initials='P.' surname='Mockapetris' fullname='P.
            Mockapetris'>
          <organization>Information Sciences Institute
            (ISI)</organization></author>
          <date year='1987' day='1' month='November' />
        </front>
        <seriesInfo name='STD' value='13' />
        <seriesInfo name='RFC' value='1034' />
      </reference>
    </references>

    <section title="Change History">
      <t>This section (and sub-sections) should be removed before
        publication.</t>

      <section title="draft-liman-tld-names-06">
        <t>Add Mc as an allowable code-point, required for names
          in Devanagari script.</t>
      </section>

      <section title="draft-liman-tld-names-05">
	<t>New affiliation and address for Liman, due to company
	merger.</t>
      </section>

      <section title="draft-liman-tld-names-04">
        <t>Removed subjective and unverified statements regarding
        deployed software. Replaced with more generic text. Polishing
        a few expressions to make them less obtrusive. Removed
        confusing paragraph after ABNF table. Updated some references
        that are now published as RFCs.</t>
      </section>

      <section title="draft-liman-tld-names-03">
        <t>More wordsmithing, and explanatory text. Work on the IANA
        and the security considerations sections.</t>
      </section>

      <section title="draft-liman-tld-names-02">
        <t>Wordsmithing and rearrangement of text following discussions
	  with Joe Abley, Tina Dam, Thomas Narten and Andrew Sullivan.
	  Incorporated revised ABNF and associated specification
	  from Patrik Faltstrom. Tightened definitions and introduced
	  the term "DNS-Label" to avoid ambiguity with various other
	  uses of the word "label".</t>
      </section>

      <section title="draft-liman-tld-names-01">
	<t>Substantial comments and improvements supplied by Thomas
	  Narten and John Klensin.  Decided to go for a minimal
	  change approach.  Also noted that U-labels have to be
	  letters due to jumping digit problem.  Rewritten major
	  parts.</t>
      </section>

      <section title="draft-liman-tld-names-00">
	<t>First cut.  Prompted by Olafur Gudmundsson and Tina Dam.</t>
      </section>

      <section title="Version Identification Tag">
        <t>
	  $Id: draft-liman-tld-names.xml,v 1.40 2011/04/12 08:20:42 liman Exp $
        </t>
      </section>
    </section>
  </back>
</rfc>

