<?xml version="1.0" encoding="UTF-8"?>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude"
     version="3"
     ipr="trust200902"
     category="info"
     submissionType="independent"
     docName="draft-rayner-proquint-11">

  <front>
    <title abbrev="Proquint">Proquints: Readable, Spellable, and Pronounceable Identifiers</title>
    <seriesInfo name="Internet-Draft" value="draft-rayner-proquint-11"/>

    <author fullname="Thomas Rayner" initials="" surname="Rayner">
      <organization>Independent</organization>
      <address>
        <email>thmsrynr@outlook.com</email>
      </address>
    </author>

    <date year="2025" month="October" day="13"/>

    <abstract>
      <t>This document specifies "proquints" (PRO-nounceable QUINT-uplets), a
      human-friendly encoding that maps binary data to pronounceable identifiers
      using fixed consonant-vowel patterns. The concept was originally described
      by Daniel Shawcross Wilkerson in 2009.
      This document formalizes the format for archival and reference.</t>
    </abstract>

  </front>

  <middle>
    <section anchor="intro" numbered="true">
      <name>Introduction</name>
      <t>Proquints encode binary data as alternating consonant-vowel letters grouped
      into five-letter syllables, yielding identifiers that are readable, spellable,
      and pronounceable. The idea and specific letter tables were first described
      by Daniel Shawcross Wilkerson in 2009 (<xref target="WILKERSON2009"/>).
      This document does not claim originality for the concept; it reformulates and
      formalizes the description for archival purposes.</t>
      <t>While multiple schemes exist for encoding network addresses and other
      binary data, Proquints aim to provide a unique blend of human-reabability,
      accessibility, and long-term usability. They reduce transcription errors, are
      friendlier for non-technical users, and offer mnemonic qualities that can help
      in educational or operational contexts. Although they may not replace all
      existing representations, Proquints can serve as a complementary format that
      improves clarity in documentation, user interfaces, and spoken communication,
      particularly where accuracy and inclusivity matter.</t>
      <t>For example, the ASCII string "F3r41OutL4w" encodes as
      "himug-lamuh-gajaz-lijuh-hubuh-lisab-", producing a stable, pronounceable
      representation.</t>
      <t>The chosen consonant and vowel sets were optimized for pronounceability in
      Indo-European languages; other alphabets or phonetic systems may prefer 
      alternate mappings.</t>
    </section>

    <section anchor="intro-networking" numbered="true">
      <name>Applicability to Networking</name>
      <t>While Proquints are general-purpose, they address concrete needs in 
      networked systems and operations. They provide a reversible, human-friendly 
      representation for binary identifiers commonly encountered by implementers and 
      operators, including:</t>
      <ul>
      <li>IP addresses and prefixes (e.g., IPv4 32-bit values map to two syllables; 
      IPv6 128-bit values map to eight syllables).</li>
      <li>When representing IPv6 addresses, encoders SHOULD use the fully expanded 
      form as defined in <xref target="RFC5952"/>, without compression ("::"), prior 
      to conversion into a Proquint sequence. For example, 
      "2001:0db8:0000:0000:0000:ff00:0042:8329" encodes as 
      "gamub-gabud-gomub-kidof-gobup-gabub-gabub-gomub-gabub-gabup-gabub-gabub-
gonok-kimub-gabup-gabub-gibuf-gomum-gasuf-gohab-".</li>
      <li>Transport and application port numbers (16-bit) and other protocol fields 
      carried in diagnostics.</li>
      <li>Node, interface, device, and request identifiers in distributed systems, 
      where verbal or low-bandwidth exchange occurs during incident response or field 
      operations.</li>
      <li>Log correlation and ticketing, where minimizing transcription errors 
      improves mean time to resolution.</li>
      </ul>
      <t>Proquints use only letters [a–z] and the ASCII hyphen (U+002D). The 
      resulting tokens are case-insensitive and label-safe for many existing systems 
      (e.g., DNS label contents, filenames, and URLs), subject to each system’s length 
      limits. For example, a 32-bit IPv4 address encodes into two syllables 
      (10 letters), and a 16-bit port number into one syllable (5 letters).</t>
      <t>Proquints are intended to complement, not replace, existing textual forms 
      (e.g., dotted-decimal IPv4, IPv6). Operators MAY use Proquints in user 
      interfaces, logs, documentation, and voice communication when human factors 
      (readability, memorability, error-resistance) are advantageous.</t>
      <t><strong>Process note:</strong> This document is submitted to the Independent 
      Stream to provide a stable archival reference for implementers and operators. 
      If substantial community interest develops in standardizing protocol use of 
      Proquints, the work MAY later be dispatched to the IETF for further processing.</t>
    </section>

    <section anchor="req" numbered="true">
      <name>Requirements Language</name>
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
      "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in BCP 14
      [<xref target="RFC2119"/>] [<xref target="RFC8174"/>] when, and only when, they
      appear in all capitals, as shown here.</t>
    </section>

    <section anchor="format" numbered="true">
      <name>Format</name>
      <t>A proquint encodes data in 16-bit blocks. Each block maps to a five-letter
      syllable of the form CVCVC (Consonant-Vowel-Consonant-Vowel-Consonant).</t>
      <t>The mapping tables are fixed:</t>
      <t>Consonants (indices 0..15):</t>
      <ul>
        <li><t>b d f g h j k l m n p r s t v z</t></li>
      </ul>
      <t>Vowels (indices 0..3):</t>
      <ul>
        <li><t>a i o u</t></li>
      </ul>
    </section>

    <section anchor="encoding" numbered="true">
      <name>Encoding</name>
      <ul>
        <li><t>Encoders MUST process the input as an ordered sequence of 16-bit words
        formed from the octet string in network byte order (big-endian).</t></li>
        <li><t>If the input contains an odd number of octets, encoders MUST append a 
        single zero octet (0x00) to complete the final 16-bit word and MUST signal this
        padding by appending a single trailing hyphen (U+002D HYPHEN-MINUS) to the end
        of the proquint string. Encoders MUST NOT append a trailing hyphen when the
        input length is even.</t></li>
        <li><t>Hyphens between syllables remain optional for readability; decoders MUST
        ignore interior hyphens. Only a single trailing hyphen has special meaning as
        a padding signal; multiple trailing hyphens are invalid.</t></li>
        <li><t>For each 16-bit word, map bits 15-12 to the first consonant, bits 11-10 
        to the first vowel, bits 9-6 to the second consonant, bits 5-4 to the second
        vowel, and bits 3-0 to the final consonant.</t></li>
        <li><t>Concatenate syllables. Hyphens MAY be inserted between syllables for
        readability; decoders MUST ignore interior hyphens. Inserting hyphens between
        syllables is encouraged to enhance human readability, despite not being
        mandatory.</t></li>
      </ul>
    </section>

    <section anchor="decoding" numbered="true">
      <name>Decoding</name>
      <ul>
      <li><t>Decoders MUST reverse the mapping in <xref target="encoding"/>.</t></li>
      <li><t>Decoders MUST accept upper- or lower-case input and MUST ignore interior
      hyphens. If the input ends in a single trailing hyphen, the decoder MUST:
      (1) decode the syllables to octets; (2) verify that the final octet is 0x00;
      and (3) remove that final octet. If a trailing hyphen is present and the final
      octet is not 0x00, the decoder MUST treat the input as invalid.</t></li>
      <li><t>If no trailing hyphen is present, the decoder MUST NOT remove any 
      trailing octet, even if it is 0x00.</t></li>
      <li><t>Inputs with multiple trailing hyphens, a trailing hyphen without any
      syllables, or a length not divisible by five letters (after removing hyphens)
      MUST be rejected.</t></li>
      </ul>
    </section>

    <section anchor="spec" numbered="true">
      <name>Encoding and Decoding Specification</name>

      <section anchor="tables" numbered="true">
        <name>Letter Tables and Indices</name>
        <t>Proquint encodes each 16-bit word as five letters in the pattern CVCVC
    (Consonant–Vowel–Consonant–Vowel–Consonant). The mapping tables and
    indices are fixed and normative.</t>
        <t>Consonant table (index 0..15):</t>
        <artwork><![CDATA[
Index  Hex  Bits  Consonant
-----  ---  ----  ---------
  0     0   0000     b
  1     1   0001     d
  2     2   0010     f
  3     3   0011     g
  4     4   0100     h
  5     5   0101     j
  6     6   0110     k
  7     7   0111     l
  8     8   1000     m
  9     9   1001     n
 10     A   1010     p
 11     B   1011     r
 12     C   1100     s
 13     D   1101     t
 14     E   1110     v
 15     F   1111     z
]]></artwork>
      <t>Vowel table (index 0..3):</t>
      <artwork><![CDATA[
Index  Bits  Vowel
-----  ----  -----
  0    00      a
  1    01      i
  2    10      o
  3    11      u
]]></artwork>
    </section>

    <section anchor="bitlayout" numbered="true">
      <name>Bit Layout</name>
      <t>Each 16-bit input value (bits 15..0, most significant bit first) MUST be
    mapped to letters in this order:</t>
      <artwork><![CDATA[
bits 15..12 -> first consonant (C1)
bits 11..10 -> first vowel     (V1)
bits  9.. 6 -> second consonant(C2)
bits  5.. 4 -> second vowel    (V2)
bits  3.. 0 -> third consonant (C3)
]]></artwork>
      <t>Encoders MUST process input as an ordered sequence of 16-bit words formed
      from the input octet string in network byte order (big-endian): octet[i]
      contributes bits 15..8 and octet[i+1] contributes bits 7..0 of the word.
      If the input contains an odd number of octets, encoders MAY pad a single
      zero octet to complete the final 16-bit word; applications using padding
      MUST specify how the original length is recovered.</t>
      <t>Encoders MAY insert ASCII hyphens (0x2D) between syllables for readability.
      Decoders MUST ignore interior hyphens, but not trailing hyphens which 
      indicate padding.</t>
    </section>

    <section anchor="encode-alg" numbered="true">
      <name>Encoding Algorithm (Pseudocode)</name>
      <artwork><![CDATA[
Input: bytes[]  // octet string
Output: string  // proquint

consonants = "bdfghjklmnprstvz"
vowels     = "aiou"

function encode(bytes):
  if len(bytes) == 0: error("empty input not allowed")

  out = ""
  i = 0
  pad = false

  while i < len(bytes):
    hi = bytes[i]; i += 1
    if i < len(bytes):
      lo = bytes[i]; i += 1
    else:
      lo = 0x00
      pad = true

    w  = (hi << 8) | lo
    c1 = consonants[(w >> 12) & 0xF]
    v1 = vowels    [(w >> 10) & 0x3]
    c2 = consonants[(w >>  6) & 0xF]
    v2 = vowels    [(w >>  4) & 0x3]
    c3 = consonants[(w      ) & 0xF]
    out += c1 + v1 + c2 + v2 + c3
    // optional: insert interior '-' between syllables for readability

  if pad and len(out) > 0:
    out += '-'   // trailing hyphen signals padding was added

  return out
]]></artwork>
      </section>

      <section anchor="decode-alg" numbered="true">
        <name>Decoding Algorithm (Pseudocode)</name>
        <artwork><![CDATA[
Input: string pq  // CVCVC syllables; interior hyphens optional;
                  // final hyphen signals padding
Output: bytes[]   // octet string

consonants = "bdfghjklmnprstvz"
vowels     = "aiou"

function indexOf(ch, table):
  pos = table.find(ch)
  if pos < 0: error("invalid character")
  return pos

function decode(pq):
  pq = toLowercase(pq)

  pad = false
  if length(pq) > 0 and pq[-1] == '-':
    pad = true
    if length(pq) >= 2 and pq[-2] == '-':
      error("multiple trailing hyphens")
    pq = pq[0:-1]   // remove the single trailing '-'

  if length(pq) > 0 and pq[0] == '-':
    error("leading hyphen not allowed")
  if contains(pq, "--"):
    error("consecutive interior hyphens not allowed")

  if pq == "": error("empty input not allowed")

  // If hyphens present: 
  //   split on '-' (no empty chunks allowed)
  // If no hyphens: 
  //   input MUST be non-empty and a multiple of 5, 
  //   then slice every 5 chars
  parts = []
  if contains(pq, "-"):
    parts = split(pq, "-")
    if any(p == "" for p in parts): error("invalid empty syllable")
  else:
    if (length(pq) % 5) != 0:
      error("run-on form length must be a multiple of 5")
    for i in range(0, length(pq), 5):
      parts.append(pq[i:i+5])

  out = new bytes[2 * length(parts)]
  k = 0
  for part in parts:
    if length(part) != 5: error("syllable length must be 5")
    c1 = indexOf(part[0], consonants)
    v1 = indexOf(part[1], vowels)
    c2 = indexOf(part[2], consonants)
    v2 = indexOf(part[3], vowels)
    c3 = indexOf(part[4], consonants)
    w = (c1 << 12) | (v1 << 10) | (c2 << 6) | (v2 << 4) | c3
    out[k]   = (w >> 8) & 0xFF
    out[k+1] =  w       & 0xFF
    k += 2

  if pad:
    if k == 0 or out[k-1] != 0x00:
      error("trailing hyphen requires final 0x00 padding byte")
    k -= 1   // drop the padding byte

  return out[0:k]
]]></artwork>
      <t>Decoders MUST accept input in either case (upper/lower) and MUST reject any
      character not in the defined consonant/vowel sets (after stripping hyphens).
      If applications use padding on encode, they MUST specify how to remove any
      trailing zero octet introduced solely for padding.</t>
      </section>

      <section anchor="norms" numbered="true">
        <name>Normalization</name>
        <t>Encoders SHOULD produce lowercase output. Encoders MUST append a single
        trailing hyphen only when signaling padding (odd input length). Decoders MUST
        treat input as case-insensitive, MUST ignore interior hyphens, and MUST apply
        the trailing-hyphen padding rule defined in this document.</t>
        <t>Encoders and decoders MUST use the tables and ordering defined in
        <xref target="tables"/> and <xref target="bitlayout"/>. Substituting letters
        or re-ordering bits is not Proquint and will not interoperate.</t>
      </section>

      <section anchor="vectors" numbered="true">
        <name>Test Vectors</name>
        <t>The following vectors are derived directly from this specification and can
        be used to verify independent implementations.</t>
        <artwork><![CDATA[
# Single-word (16-bit) values:
0x0000 -> babab
0xFFFF -> zuzuz
0x1234 -> damuh
0xF00D -> zabat
0xBEEF -> ruroz

# Two words (32-bit), big-endian byte order:
bytes:  0x12 0x34 0xF0 0x0D
words:  0x1234, 0xF00D
pq:     damuh-zabat      (with hyphen)  or  damuhzabat (without)

# Raw ASCII example ("F3r41OutL4w"),
# UTF-8 bytes, zero-padded to even length:
ASCII:  46 33 72 34 31 4F 75 74 4C 34 77
Length: 11 bytes
Pad:                                      00
Words:  0x4633 0x7234 0x314F 0x7574 0x4C34 0x7700
PQ:     himug-lamuh-gajaz-lijuh-hubuh-lisab- (interior hyphens optional)

# IPv6 address example
IPv6: 2001:0db8:0000:0000:0000:ff00:0042:8329
PQ (displayed as multi-line): 
    gamub-gabud-gomub-kidof-gobup-gabub-gabub-gomub-gabub-
    gabup-gabub-gabub-gonok-kimub-gabup-gabub-gibuf-gomum-
    gasuf-gohab- 

# Padding examples
# Even-length input (no padding, no trailing hyphen):
Bytes:  01 02 03 00
Words:  0x0102, 0x0300
PQ:     bahaf-basab           (or "bahafbasab" without interior hyphen)
Out:    01 02 03 00

# Odd-length input with padding signaled by trailing hyphen:
Bytes:  01 02 03
Encoder Pads:                -> add 00 to form final word 0x0300
PQ:     bahaf-basab-          (trailing hyphen REQUIRED)
Decoder: decodes to 01 02 03 00, verifies last octet 00, then removes it
Out:    01 02 03

# Invalid (trailing hyphen but last octet != 00):
PQ:     bahaf-basad-
-> decode last word to ... 01 (not 00) => ERROR

# Invalid (multiple trailing hyphens):
PQ:     bahaf-basab--         => ERROR
]]></artwork>
        <t>Implementations MUST reproduce these outputs exactly.</t>
      </section>

      <section anchor="errors" numbered="true">
        <name>Error Handling</name>
        <t>Decoders MUST fail input that: (1) contains characters outside the defined
        tables (after interior hyphen removal); (2) has length not divisible by 5 
        letters; or (3) violates the CVCVC pattern. Error signaling is 
        application-specific but MUST reject invalid input rather than attempt to 
        guess.</t>
        <t>A trailing hyphen MUST only be used to signal removal of a single trailing
        0x00 octet; any other usage is invalid.</t>
      </section>

      <section anchor="compat" numbered="true">
        <name>Backward Compatibility</name>
        <t>Implementations that predate this specification’s padding specification may
        ignore a trailing hyphen and therefore retain the trailing 0x00 octet. To
        interoperate with such decoders, producers SHOULD avoid relying on padding
        removal when communicating with unknown peers.</t>
      </section>
    </section>

    <section anchor="security" numbered="true">
      <name>Security Considerations</name>
      <t>This document defines a reversible textual encoding. It provides no 
      confidentiality, integrity, or authenticity by itself.</t>

      <t><strong>Threat: Loss of confidentiality.</strong> Proquints are a lossless, 
      human-friendly representation of binary data. Encoding sensitive identifiers 
      (e.g., keys, tokens, internal IDs) as Proquints does not hide their value and may 
      make them easier to read, speak, or copy.</t>
      <t><em>Remediation:</em> Treat Proquints with the same confidentiality as the 
      underlying data. When transporting sensitive information, use authenticated 
      encryption or protected channels appropriate to the application (e.g., TLS, SSH, 
      end-to-end encryption). Avoid placing sensitive Proquints in unauthenticated logs, 
      screenshots, or voice channels.</t>

      <t><strong>Threat: Misuse as secrets or passwords.</strong> Because Proquints 
      are pronounceable, implementers might be tempted to use them directly as passwords 
      or shared secrets.</t>
      <t><em>Remediation:</em> Proquints MUST NOT be used as standalone authentication 
      secrets unless generated with appropriate entropy and policy for that purpose. 
      Where human entry is required, rely on established secret-generation and storage 
      practices; do not assume pronounceability confers security.</t>

      <t><strong>Threat: Transcription and spoofing errors.</strong> Human reproduction 
      (reading, hearing, typing) can introduce errors or social-engineering opportunities.</t>
      <t><em>Remediation:</em> Implement input validation exactly as specified in this 
      document (tables, syllable structure, hyphen grammar). When used in safety- or 
      security-relevant workflows, consider checksuming or context-binding at the 
      application layer (e.g., include context or MAC over the underlying binary) and 
      use redundancy when verbally communicating critical values.</t>

      <t><strong>Threat: Side-channel leakage via formatting.</strong> The trailing-hyphen 
      padding signal reveals the parity of the original octet length (odd/even).</t>
      <t><em>Remediation:</em> Applications that consider length parity sensitive SHOULD 
      avoid emitting the trailing-hyphen signal by ensuring even-length inputs or by 
      wrapping Proquints inside a protected container. In most operational uses this 
      leakage is not security-relevant.</t>

      <t><strong>Threat: Injection into protocol or storage contexts.</strong> Improper 
      normalization or acceptance of characters outside the defined alphabet could enable 
      injection or confusion in downstream systems.</t>
      <t><em>Remediation:</em> Encoders SHOULD emit lowercase and MAY include interior 
      hyphens for readability only. Decoders MUST accept case-insensitive input, MUST 
      ignore interior hyphens, MUST enforce the defined tables and CVCVC pattern, and MUST 
      reject leading hyphens, multiple trailing hyphens, and consecutive interior hyphens. 
      Do not accept characters outside [a–z] and hyphen.</t>

      <t><strong>Threat: Transport-specific risks.</strong> Use of Proquints in URLs, 
      filenames, DNS, or messaging systems can expose data if those channels are 
      observable.</t>
      <t><em>Remediation:</em> When Proquints convey sensitive data, use secure transport 
      appropriate to the context, apply access control and retention policies to logs, 
      and avoid transmitting sensitive values over unprotected voice or chat. Proquints 
      are ASCII-only and hyphenated; they are generally safe for "LDH" contexts 
      (letters–digits–hyphen), but applications MUST observe each system's length and 
      syntax limits.</t>
    </section>

    <section anchor="iana" numbered="true">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>

    <section anchor="process" numbered="true">
      <name>Process Note</name>
      <t>The intent of this document is archival and implementer guidance via the 
      Independent Stream. The author does not seek standardization at this time. If 
      significant deployment or protocol integration interest emerges, a future effort 
      MAY be dispatched within the IETF to consider Standards Track work.</t>
    </section>

    <section anchor="ack" numbered="true">
      <name>Acknowledgments</name>
      <t>The author thanks Daniel Shawcross Wilkerson for originating the proquint
      concept and publishing the initial specification in 2009 (<xref target="WILKERSON2009"/>).</t>
      <t>The author also thanks Lucas Bremgartner for his detailed review and thoughtful 
      suggestions. His insights substantially improved both the clarity and correctness of the 
      specification. His independent implementation also provided a valuable cross-check of the 
      design.</t>
    </section>
  </middle>

<back>
  <references>
    <name>References</name>

    <references anchor="normative">
      <name>Normative References</name>

      <referencegroup anchor="BCP14" target="https://www.rfc-editor.org/info/bcp14">
        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author initials="S." surname="Bradner" fullname="Scott Bradner"/>
            <date month="March" year="1997"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>

        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author initials="B." surname="Leiba" fullname="Barry Leiba"/>
            <date month="May" year="2017"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
      </referencegroup>

      <reference anchor="RFC5952" target="https://www.rfc-editor.org/info/rfc5952">
        <front>
          <title>A Recommendation for IPv6 Address Text Representation</title>
          <author fullname="S. Kawamura" initials="S." surname="Kawamura"/>
          <author fullname="M. Kawashima" initials="M." surname="Kawashima"/>
          <date month="August" year="2010"/>
        </front>
        <seriesInfo name="RFC" value="5952"/>
        <seriesInfo name="DOI" value="10.17487/RFC5952"/>
      </reference>
    </references>

    <references anchor="informative">
      <name>Informative References</name>
      <reference anchor="WILKERSON2009" target="https://arxiv.org/html/0901.4016">
        <front>
          <title>Proquints: Identifiers that are Readable, Spellable, and Pronounceable</title>
          <author initials="D.S." surname="Wilkerson" fullname="Daniel Shawcross Wilkerson"/>
          <date month="January" year="2009"/>
        </front>
        <seriesInfo name="arXiv" value="0901.4016"/>
      </reference>
    </references>

  </references>
</back>
</rfc>
