The Diamond Encryption Algorithm by Michael Paul Johnson Abstract--Diamond is a royalty-free, symmetric key block cipher encryption algorithm based on a combination of nonlinear functions. This block cipher may be implemented in hardware or software. Diamond uses a block size of 128 bits and a variable length key. A faster variant of Diamond uses a block size of 64 bits. Diamond is an incremental improvement over MPJ2 and MPJ. Index Terms--Diamond, encryption, cryptography, cryptanalysis, computer security, communications security, MPJ, MPJ2. INTRODUCTION General symmetric key block ciphers have numerous applications in computer security, communications security, detection of data tampering, and creation of message digests for authentication purposes. The longer any one such algorithm is used, and the more use it gets, the greater the incentive to break it, and the greater the probability that methods will be devised to break the algorithm. For example Michael J. Wiener has shown that breaking DES is within the capabilities of many nations and corporations [1]. This sort of reduction in the relative security of DES was anticipated several years ago. One proposed solution is the International Data Encryption Algorithm (IDEATM) cipher [2], which was described in [3] and [4] as the Improved Proposed Encryption Standard (IPES). Another one is the MPJ Encryption Algorithm [5], which evolved to the Diamond Encryption Algorithm. In the field of cryptography, it is good to have many good algorithms available. DESIGN OF DIAMOND Diamond was designed to be strong enough to provide security for the foreseeable future. It was also designed to be easy to generate keys for, and to be practical to implement in hardware, software, or in a hybrid implementation. Strength Three major factors influence the strength of a block cipher: (1) key length, (2) block size, and (3) resistance of the algorithm to attacks other than brute force (such as differential cryptanalysis) [3] [6]. The key length is variable to allow you to select your own trade-off between security and volume of keying material needed. The block size is chosen to make brute force attacks using precomputed tables require an obviously intractable amount of data storage. Diamond uses a variable length key of at least 40 bits. The use of at least a 128 bit key is recommended for long term protection of very sensitive data, as a hedge against the possibility of computing power increasing by several orders of magnitudes in the coming years. The block size is fixed at 128 bits, because larger block sizes are unlikely to make any practical difference in security, and because this in a convenient binary multiple. The problem of making sure that there is no known attack that is more efficient than brute force is much more difficult than simply selecting sizes for keys and blocks. This is attempted by creating a composite function of simpler nonlinear functions in such a way that the internal intermediate results cannot be solved for and such that there is a strong dependence of every output bit on every input bit and every key bit. An ideal 128 bit block cipher would use a z bit key to select one of 2z functions from the set of all one to one and onto functions that map one input block of 128 bits to one output block of 128 bits. Ideally, these 2z functions would be the most nonlinear, difficult to analyze functions out of the (2128)! possible functions. In practice, the key selects one of 2z functions from an arbitrary selection of possible functions numbering between 2z and (2128)!. The use of purely nonlinear functions makes a large portion of mathematical tools ineffective for cryptanalysis. Ease of Key Generation Key generation should be as simple as generating a random number by measuring some random physical process. Since there is no complex or secret strong key selection process, distributed key management protocols are practical. Distributed key management is preferable in many applications to centralized key management because there is no single point of failure at which the whole system could be compromised. Practical to Implement in Hardware or Software The prototype algorithm is implemented in a program for a personal computer. When properly implemented in hardware, Diamond should not significantly slow down any practical digital data stream. On the other hand, setting up a new key need not be as fast as the encryption and decryption operations, since (1) key change operations are less frequent than encryption and decryption operations, and (2) a slower key setup operation discourages brute force attacks. BASIS OF DESIGN The thought process that went into the design of Diamond is based on the following ideas: 1. Linear functions and combinations of functions can often be solved analytically in ways that are not obvious to the cipher designer, and should be avoided. This includes standard arithmetic functions, math in finite fields, and Boolean arithmetic. 2. Reversible block ciphers with a block size of n bits can be viewed as a simple substitution cipher on an alphabet of 2n characters, with a key that selects the permutation used. 3. Simple substitution ciphers can be represented with a look-up table or array, but in practice the array required is too big to fit comfortably in a computer's memory. 4. An adequate subset of the oversized look-up table can be simulated by simply interleaving rounds of substitution of sub-blocks with bit permutations that serve to spread functional dependencies across sub-block boundaries. DESCRIPTION OF ALGORITHM The Diamond Encryption Algorithm consists of three main parts: (1) key scheduling, (2) substitution steps, and (3) permutation steps. Encryption and decryption both consist of n rounds of substitution operations, where n is at least 10. Each substitution operation takes each of the 16 input bytes of 8 bits each, and substitutes another byte for it based on the contents of the substitution array for that byte position and round number. The key scheduling operation fills the internal substitution arrays based on the key. Between each substitution, a fixed permutation step uses a bit selection process to make each output byte a function of eight different input bytes. Unlike DES, every round alters every byte of the input block (instead of just half of the input block). After 5 rounds, bit of the output block is a nonlinear function of every byte of the input block and every bit of the key. The additional rounds after the fifth round serve to ensure that solving for the contents of the individual substitution arrays is more work than a brute force attack on the cipher. They also serve to increase the number of possible functional relationships that the key selects from, thus making this algorithm closer to the ideal block cipher, and making cryptanalysis more difficult. Key Scheduling There is one substitution array for each of the 16 bytes of the encryption block for each round. For a ten round implementation of Diamond, 160 substitution arrays are to be filled. Each of the 160 arrays contains 256 elements of one byte each. It is convenient to look at the set of substitution arrays as one three dimensional array, indexed by round, byte position within the 16 byte encryption block, and input byte value. A similarly indexed inverse substitution array is used during decryption. For the substitution to be reversible, each of the 256 possible values of an 8 bit byte must occur exactly once in the array. The process used to make this happen consists of five processes: (1) array filling, (2) element placement, (3) pseudorandom key expansion, (4) pseudorandom number normalization, and (5) array inversion. Although key scheduling can be done more quickly in a dedicated hardware implementation, a more economical hybrid design would do the key scheduling in firmware and the actual encryption or decryption in hardware. Array filling is simply a nested loop where all 160 substitution arrays are filled. It is concisely expressed in this pseudo code: For rounds := 1 to n For byte position := 1 to 16 For element value := 255 down to 0 Place this element. Element placement is done by placing the current element in one of the unfilled positions in the current array. The unfilled positions of the current array are numbered from 0 to the value of the element being placed. A number in this same range is then selected by generating a pseudorandom number normalized to this much smaller range. This offset is used to place the current element and mark that location as having been filled. In the trivial case where there is only one more unfilled element, no pseudorandom number is generated. Pseudorandom key expansion uses a simple method to provide key dependent bits as needed to place array elements. A pointer is set to the first 8-bit byte of the key. A 32 bit CRC accumulator is set to all ones (FFFFFFFF hexadecimal). This initial value is used rather than all zeros so that an all zero external key would not be weak. Every time a pseudorandom number is requested, the CRC is updated using the CCITT CRC-32 [7] using the key byte pointed to by the pointer. The pointer is then moved to the next key byte. After the pointer is moved beyond the end of the last key byte, the CRC is updated with the least significant byte of the size of the key (in bytes), then with the next to least significant byte of the size of the key (in bytes), then the pointer is moved back to the first byte of the key. If the actual key size used is not a multiple of 8 bits, then the unused bits of the last key byte are set to 1, with the used bits occupying the least significant bits of the byte. Although no upper limit is explicitly given for key size, increasing the key size provides no significant increase in security if more than approximately 28 672 n bits are used, where n is the number of rounds used. This upper limit is large enough that even fictional computers [8] would have difficulty with a brute force attack. To normalize the 32 bit accumulator value to the desired number range from 0 to n, first perform a logical "and" operation on the accumulator with the value 2m-1, where m is the smallest integer value such that 2m-1 n. This will select the minimum number of bits required to cover the range needed. If the resulting value is less than or equal to n, use it. If it is not, then repeat the above process with a new pseudorandom number. If, after 97 attempts the value is still not in range (a very low probability condition), simply subtract n from the value and use it. If the decryption mode of Diamond is to be used, calculate the inverse substitution arrays directly from the encryption substitution arrays as follows: For rounds := 1 to n For byte position := 1 to 16 For k := 0 to 255 do inverse array[array[k]] := k Substitution In each substitution round, each byte of the input block is replaced with the contents of the substitution array for that round, byte position, and byte value. For decryption, the same operation is performed with the inverse substitution array. In a hardware implementation, this is can be done quickly by simply addressing static RAM. Note that the substitution arrays used in the Diamond Encryption Algorithm are different from the S-Boxes used in ciphers like DES, in that (1) they are much larger, (2) there are more of them, and (3) they are not used in conjunction with a simpler operation with a key that could be solved for with differential cryptanalysis. Permutation Between each substitution round, a fixed permutation is performed. The purpose of this permutation step is to increase the effective block size of the cipher by making each output byte a function of 8 input bytes by simply selecting one bit from each of 8 input bytes. Every bit of the input block is used exactly once in the output block. In a hardware, this can be done with literal wire crossings. In software, efficiency is gained by ensuring that every bit ends up in the same position relative to a byte boundary as where it started. The specific permutation used for encryption takes the least significant bit of each byte from the input byte in the same position. The next most significant bit is taken from the input byte indexed as one byte higher (mod 16). The next most significant bit is taken from the input byte indexed as two higher (mod 16), and so on. For decryption, the inverse of this operation is the same, except the byte positions used are one byte lower (mod 16) instead of higher. After 2 rounds, every output byte is a function of 8 input bytes and all key bytes (if the key is less than 4080 bytes, which is likely). After 3 rounds, every output byte is a function of 15 input bytes and the key. After 4 rounds, every output byte is a function of every input byte and the key. The minimum of 6 additional rounds are intended to make cryptanalysis more difficult. CRYPTANALYSIS OF DIAMOND In the design of Diamond, several types of cryptanalytic attacks were considered. The reasons why I currently consider each of them to be computationally infeasible are listed below. If anyone finds an attack on Diamond that is better than a brute force attack on the key, please let me know. The following consists of rough order of magnitude estimations and hand waving, but they are of value anyway. To construct more exact proofs, actual construction of all of the cryptanalytic attacks that your opponent might try is required. It is conjectured that actual construction of the attacks mentioned would be much more complex than the following estimates indicate. Brute Force Exhaustive key search can be made intractable (beyond the reach of any likely enemy) by choosing a key length of around 120 bits. A loose lower bound of the cost of exhaustive key search can be placed with the following generous assumptions. Assuming a massively parallel machine can perform a trillion decryptions per second (with different keys) on each of a billion processors, then an exhaustive key search would take an average of about 42 million years. The user may wish to use smaller key sizes in some applications to save in key management costs, while still maintaining adequate protection for the value of the specific data. Larger keys than 128 bits probably do not contribute significantly to the over-all security of a system of data protection, because of some other attacks on data security that are possible. If you do want to use a much larger key, increasing the number of rounds to greater than 10 is recommended. Another form of brute force attack that is available with block ciphers is the precomputed dictionary attack. The idea here is to create a database of one very probable plain text block encrypted under all possible keys. Sort the resulting cipher text/key pairs by cipher text value and store it in a table. Then to attack a piece of cipher text, look up the possible key in the table and try it on the rest of the message. This may take several iterations, but would be likely to succeed if you could store so much data. The problems here are, of course (1) time to generate the table, and (2) sorting and storing the table. Note that the defined lower bound on Diamond key size of 40 bits is not very secure against a brute force attack. Analytical Attack with Chosen Plain Text An analytical attack involves solving for the contents of at least one of the substitution arrays. If one array could be isolated by selecting carefully chosen inputs and outputs, then its contents could be solved for. Once this was done, this knowledge could be used to solve for additional substitution arrays. The problem with this attack is that because every output byte is a function of every input byte and at least 112 substitution arrays, this decomposition is difficult. I conjecture that a loose lower bound on the complexity of this kind of attack is that it would take more operations than 256 2x, where x is the number of arrays in the dependency chain for each byte. For a 10 round Diamond implementation, this would be an approximate total of 256 16(2112 + 296 + 280 + 264 + 248 + 233 + 218 + 210 +22) 2 1037 operations. This is about as hard as solving for a 124 bit key with a precomputed plain text attack, but even less practical. Differential Cryptanalysis Attacking Diamond with a form of differential cryptanalysis as described in [6] does not work directly. A similar approach using differences in known plain text values would have some value in reduced Diamond variants with 3 or fewer rounds, but would be no better than the analytical attack discussed above for 10 or more rounds. Solving for the Key from an Array If you could solve for one substitution array, and knew (or guessed) the length of the key, a method might be constructed to directly solve for the key from the contents of one substitution array. This reduces the strength lower bound estimate for an analytical attack on a 10 round Diamond system to about 256 2112 1036, or about as strong as a 120 bit key. Bypassing the Algorithm In any security system, care must be taken to avoid the possibility that the hardware and/or software doing the encryption and decryption is not tampered with or replaced. For very high security applications, a hardware device that is implemented on a tamper resistant chip in a tamper resistant enclosure is preferable to a pure software implementation. Care must also be taken to ensure that the sensitive data is physically secure when in plain text form. Key management, although it is beyond the scope of this paper, is also a critical concern. DIAMOND LITE Where software speed and table space are critical, a variant of Diamond that has a block size of 8 bytes (64 bits) and a minimum of four rounds (8 recommended) is a reasonable compromise. This variant has the advantage that every bit of the output is a function of every bit of the input and every bit of the key after only two rounds. At least four rounds are needed, however, to ensure that the algorithm is strong enough to justify keys of about 64 bits. This only requires 8192 bytes of table space and offers faster speed in software than the full Diamond with a 16 bit block size. It is conjectured that Diamond Lite with 8 rounds and a key length of 128 bits is at least equivalent in security to the IDEATM cipher, and more secure than the aging DES algorithm. LEGAL ISSUES The Diamond and Diamond Lite Encryption Algorithms may be used for any legal purpose without payment of royalties to the inventor or his employer, however the names "Diamond Encryption Algorithm" and "Diamond Lite Encryption Algorithm" are Trade Marks owned by the inventor, and may not be used in connection with any algorithm that does not comply with the specifications given herein and in the sample programs written by the inventor. The Diamond Encryption Algorithm is the same as the MPJ and MPJ2 Encryption Algorithms, with the exception of some of the key expansion algorithm. Some governments may restrict the use, publication, or export of strong encryption technology. CONCLUSION Diamond and Diamond Lite are two of several alternatives to the aging and now relatively insecure DES algorithm. Source code for a software implementation of Diamond in C is available from the author for a minimal shipping and handling charge or for free electronic transfer where allowed by law. Contact the author at one of the addresses below for details. Comments, questions, and reports of possible weaknesses should be sent to the author at: Mike Johnson PO BOX 1151 LONGMONT CO 80502-1151 USA BBS: 303-938-9654 Internet mail: mpj@csn.org Internet ftp:csn.org//mpj/README.MPJ CompuServe: 71331,2332 REFERENCES [1] Michael J. Wiener, "Efficient DES Key Search," Bell-Northern Research, PO Box 3511 Station C, Ottawa, Ontario, K1Y 4H7, Canada, 20 August 1993. [2] Theodor Brggemann and Hoger Brk, "Der Verschlsselungsalgorithmus IDEATM," Ascom Tech AG, Fachbereich Kryptologie, Ziegelmattstrasse 1, CH - 4503 Solothurn, Switzerland. [3] Xuejia Lai and James L. Massey, "Markov Ciphers and Differential Cryptanalysis," in Advances in Cryptology EUROCRYPT '91, Springer-Verlag, pp 17-38. [4] Xuejia Lai "Detailed Description and a Software Implementation of the IPES Cipher," Institut fr Signal- und Informationsverarbeitung, ETH Zrich. [5] Michael Paul Johnson, "Beyond DES: Data Compression and the MPJ Encryption Algorithm," Master's Thesis at the University of Colorado at Colorado Springs, 1989. Available by anonymous ftp to csn.org in /mpj or on the author's BBS at 303-938-9654. [6] Eli Biham and Adi Shamir, Differential Cryptanalysis of the Data Encryption Standard. New York: Springer-Verlag, 1993. [7] C Programmers Guide to NetBIOS, Howard W. Sams & Co., Inc. [8] Rick Sternbach and Michael Okuda, Star Trek the Next Generation Technical Manual, New York: Pocket Books, 1991. [9] Xuejia Lai and James L. Massey, "A Proposal for a New Block Encryption Standard," in Advances in Cryptology -- EUROCRYPT '90, Springer-Verlag, pp 389-404., 1990. Copyright (C) 1994 Michael Paul Johnson. All rights reserved.