Attacks against the BEST encryption algorithm
Chaos is definitely not randomness
papers
Papers
13 September 1999
by +Spath.
+cracker
Courtesy of Fravia's page of reverse engineering
fra_00xx
980913
+Spath
0110
NA
PC
Magistral essay by +Spath. Tackling an +HCU challenge. Definitely a MUST reading for all reversers. +Spath has written a cryptoanalystycal masterpiece.
advanced
Advanced
There is a crack, a crack in everything That's how the light gets in
Rating
( )Beginner (X)Intermediate (X)Advanced ( )Expert

A cryptanalysis of a bitstream cipher, in answer to a challenge proposed to the +HCU.
Attacks against the BEST encryption algorithm
Chaos is definitely not randomness
Written by +Spath.


Introduction
Some time ago, Jeremy Likness proposed a cryptographic challenge to the +HCU, to test the strength of his BEST encryption algorithm. This essay is not a solution to his challenge, but a cryptanalysis of his cipher that proves that the BEST algorithm is not secure. This paper first describes weaknesses of the BEST algorithm, particularly a flaw in the chaotic function. These weaknesses are then exploited for a chosen-plaintext attack. At last, I show that these flaws can also compromise the security of the encryption in ciphertext-only attacks. I conclude that the BEST encryption algorithm is not secure.

Tools required
None

Target's URL/FTP
The BEST (Bitstream Expert Statistical Translation) algorithm described at http://www.mindspring.com/~jayke/Bitstream.htm. The challenge was published at http://fravia.org/jeremy1.htm. An implementation of BEST can be found at http://fravia.org/jn_essay.htm.

Program History
None.

Essay

ABOUT THE BEST CHALLENGE

THE BEST CIPHER

I)   FROM CHAOS TO ORDER  

II)  THE CHOSEN-PLAINTEXT ATTACK  

III) THE FIRST BYTE

IV)  ALL FOR ONE  

V)   A REAL-LIFE ATTACK





ABOUT THE BEST CHALLENGE

A few words before we start : as it is described on Fravia+'s site, 

the BEST challenge is absurd. As pointed out on Fravia+'s board by 

Ghiribizzo and others, no serious evaluation can be done with only a 

few plaintext-ciphertext pairs and without any description of the 

algorithm. 



With the rules that Jeremy chose, even if nobody breaks his challenge 

in 100 years, it does not prove a single thing about the strength of 

his algorithm. On the contrary, a cryptanalysis can much more easily 

expose weaknesses of the algorithm and describe various possible attacks. 

Fortunately, all the informations needed for a cryptanalysis were available 

on Jeremy's homepage, including a description of the algorithm and some C 

source code. 





THE BEST CIPHER

To fully understand this paper, I highly recommend to get a copy 

of the original definition of the BEST cipher on Jeremy's homepage. 

However, here's a description of the BEST encryption algorithm, 

where :



Ix is the bit number x of the input file (plaintext).

Ky is the bit number y of the key file.

Ox is the bit number x of the output file (ciphertext).

f(n) = 3.56999999*n*(1-n), the chaotic function.

maxy is the size of the key file (in bits).

^ is the XOR operator.

~ is the NOT operator (bit inverse).

% is the modulus operator.

Norm is a normalizing function that keep the decimal

     part of a number, i.e. between 0 and 1. 



A0 = Encryption algorithm

  1. n = seeder value 

  2. r = 3.56999999

  3. For each x

  4. Ox = Ix ^ Ky

  5. Ky = Ox

  6. If n > 0.5 then Ox = ~Ox

  7. If Ix = 0 then n = n + 0.33333333

  8. Else n = n + 0.66666666

  9. n = Norm(f(f(n)))

 10. If Ox = 1 then

 11. y = (y +1) % maxy

 12. If y % 8 = 0 then

 13. { Ky = Ky ^ Ky-8, . . ., Ky+7 = Ky+7 ^ Ky-1 }

 14. End of for loop



The seeder value is computed by looping through the key file for the target 

file length. Later, I will use the notation Kx(y) when I refer to bit y of 

key byte x (x and y lowest values are 0). I will use the same notation for 

the plaintext bytes (I) and the ciphertext bytes (O).



Now let's write the dependency equations of this algorithm :



Ox = F(Ix, Ky, n)     (A)

Ky = F(Ox, Ky-8)      (B)

n(k+1) = F(Ix, n(k))  (C)

seed = F(K, size(I))  (D)



I think that equations B, C and D show weaknesses of the algorithm, I will 

use B and C for my attack.





PART ONE : FROM CHAOS TO ORDER

Let's make a little experiment : fire Hiew and create a plaintext file 

filled of '00' bytes ; choose any file on your hard disk as key file 

and encrypt the plaintext with BestWin. Now look at the ciphertext :

it's almost completely filled by '00' bytes.



What happened ?   



If we submit an 'all 0' input file, we can simplify steps 4 and 7 and 

remove steps 5 and 8 from the algorithm, so that we have :



  3. For each x

  4. Ox = Ky

  6. If n > 0.5 then Ox = ~Ox 

  7. n = n + 0.33333333

  9. n = f(f(n))

 10. If Ox = 1 then

 11. y = (y +1) % maxy

 12. If y % 8 = 0 then

 13. { Ky = Ky ^ Ky-8, . . ., Ky+7 = Ky+7 ^ Ky-1 }

 14. End of for loop



Now if we study Xn+1 = f(f(Xn)), we see that something very strange 

happens : whatever initial value you chose, after some time, the 

normalized chaotic function get stuck and converges to 2 values. 

Look at these succesive values of n after step 7 ; the left column 

lists the values for even bits and the right column the values for 

odd bits :



     even bits           odd bits



0.8923637046805643   0.8377088795512472  

0.8923637046808309   0.8377088795507856   

0.8923637046808787   0.8377088795507024   

0.8923637046808873   0.8377088795506879   

0.8923637046808889   0.8377088795506854  

0.8923637046808890   0.8377088795506848   

0.8923637046808892   0.8377088795506848  

0.8923637046808892   0.8377088795506848

      ...                    ...



To have a good idea of what happens, you can draw f(f(n)) and g(n)=n.

The f(f(n)) curve has a symetry axis at 0.5 (since f(1-n)=f(n)), it has 

1 minimum and 2 maxima, one of them being (0.8320, 0.8923), which explains 

the convergence. We see also that the second extremum is very close from 

the curve of g(n), and that the distance between the two maxima is very 

close from 0.33 (modulo 1).   



So after enough '00', the chaotic function will just give out two values, 

0.8923637046808892 for even bits and 0.8377088795506848 for odd bits (these 

values may be switched for other keys). 



Both normalized values are greater than 0.5, so we can simplify step 6, and 

after enough rounds we will have this algorithm : 



  3. For each x

4+6. Ox = Ky^1 = ~Ky

  7. n = n + 0.33333333

  9. n = f(f(n))

 10. If Ox = 1 then

 11. y = (y +1) % maxy

 12. If y % 8 = 0 then

 13. { Ky = Ky ^ Ky-8, . . ., Ky+7 = Ky+7 ^ Ky-1 }

 14. End of for loop



Now suppose we reach a '1' bit of the key, i.e. Ky = 1, we have therefore 

Ox = 0, the y index is not incremented and since n will remain stuck, y

will not change any more : all following bits will be assigned to this ~Ky 

null bit and the rest of the output file will be all "00". 



This is particularly striking with a 'all 00' file, but this also happens 

with shorter patterns. So the first conclusion is : depending on the key, 

any long enough "00" pattern in the plaintext will create a "00" pattern 

in the ciphertext (or in other words, we can guess sequences of null bytes 

of the plaintext).

  

  

PART TWO : THE CHOSEN-PLAINTEXT ATTACK 

To illustrate this chosen-plaintext attack, I will take as example a 

simple 25 bytes long key ; here's its hex dump :



54 65 73 74 6f 6E 73 20 75 6E 20 6E 6F 75 76 65 

6C 6C 65 20 66 6F 69 73 20                                 



The plaintext that will be encrypted is a file defined as follows :



In = 01h  if n=16+8*k, k being any integer

In = 00h  elsewhere



(in other words, all the plaintext bytes are null, except I16, I24, 

I32, ... that are equal to 01h).



All the "00" bytes are used to unbalance the algorithm in the previously 

described state. The file also contains a repetitive pattern of 8 bytes 

containing only one set bit : this bit is used to get next sequence of 

'1' bits of the key, as you will see later. For this example, I took a 

8016 bytes long plaintext ; however, this method should work for any 

plaintext that is at least 30 times bigger than the key.



Whatever key you chose, some patterns appear in the output file, 

i.e. after enough rounds we can identify a periodic byte sequence. 

Jeremy tried to prevent patterns from appearing in the output file 

by modifying the key at each used byte (at step 13). But since the 

original key is restored after all its bytes have been used, an input 

file with repetitive patterns and fixed n values will also cause 

patterns to appear in the output file.



Since we know that the key index y is incremented only if the output 

bit is '1' (at step 10), the conclusion is : the size of the key is 

equal to the number of '1' bits in a period of the output file. 



With our example, the smallest pattern is 66 bytes long, starting at 

offset 21, and it contains 200 bits with the value '1', so we know that 

the key is 200 bits long (25 bytes).





PART THREE : THE FIRST BYTE    

Let's focus now on the first bytes of the ciphertext ("00" bytes have 

been skipped, the bits that have an underscore on them were already '1' 

in the plaintext) :



byte nb :       0              16              24



value   :      07h             05h  _          3Fh  _

            0000 0111b      0000 0101b      0011 1111b

                                

Before we try to recover some bits of the key, we must be aware of an 

undocumented "feature" of the algorithm : the initial n value (seed) is 

calculated from the bytes of the key, but the pointer in the keyfile 

is not reset after this job. Therefore, the first byte of the plaintext 

is (most of the time) not encrypted by the first byte of the key. So our 

first job is to find which byte of the key was used to compute these 3 

ciphertext bytes.   



We know that the seed calculation is done by looping through the key file 

for the target file length, and at this moment we know both file sizes, 

so we can find this information. In our testcase, I encrypted a 8016 bytes 

plaintext and I found out that the key is 25 bytes long so the first key 

byte used for encryption will be : (16016 % 25) = 16. So the first plaintext

byte (I0) is encrypted by K16 (the 17th byte of the key file, since we 

start from K0). 



Now let's come back to equation (C) : n(k+1) = F(Ix, n(k))  

 

That means that if I know the input file (this is the case here) and one 

value of n, then I also know all future values of n. Hey, but we do know 

some value of n, since after some '00' patterns, n is equal to one of the 

two values we saw before. So when the first '1' bit of the plaintext is 

processed (that is I16(0)), n is equal to 0.8923637046 or 0.8377088795. 

Let's assume for now that the value of n when the first bit of I16 is 

processed is the second one.



The first three bits of K16 are unknown, since they are used to encrypt

the first byte of the key, and for this one we don't know anything about n.

The fourth key bit is necessarily a '1', since it's the requested value to 

unbalance the chaotic function in the state where it continuously gives out

'0' ; so K16(3) = 1.



Now we can recover all other bits of K16, since we can calculate all 

future values of n ; that's what we're doing now :



Let's focus on byte 16, we have :



For bit 0 :

   4. O16(0) = I16(0) ^ K16(3) = 1 ^ 1 = 0   

   6. n = 0.8377 => O16(0) = ~O16(0) = 1

   8. n = 0.8377 + 0.666666666

   9. n = Norm(f(f(n))) = 0.3427

  11. y = y +1



For bit 1 :

   4. O16(1) = K16(4) = 0   ===>  K16(4) = 0

   7. n = n + 0.33333333 

   9. n = Norm(f(f(n))) = 0.6088



  (note that this ciphertext bit is always a pure key bit)



For bit 2 :

   4. O16(2) = K16(4) = 0

   6. n = 0.6088 => O16(2) = ~O16(2) = 1

   7. n = n + 0.333333

   9. n = Norm(f(f(n))) = 0.5590

  11. y = y + 1 

   

No other output bit is at '1', so we are sure that the next key bit is 

'1' : so K16(5) = 1. Now we can focus on byte 24 : for bit 0 and bit 1, 

it's the same story as for byte 16, and bit 1 is also a pure key bit : 

K16(6) = O24(1) = 1



For bit 2 :

   4. O24(2) = K16(7) 

   6. n = 0.6088 => O24(2) = ~O24(2) = 1   ===>   K16(7) = 0

   7. n = n + 0.333333

   9. n = Norm(f(f(n))) = 0.5590



Let's summarize what we found : the first key byte used to encrypt the 

plaintext is K16, and its binary value is 0110_1??? ; therefore, there 

are only 8 candidates for this 17th key byte (from 68h to 6Fh).





PART FOUR : ALL FOR ONE

I will now show you how to recover the complete key if you know one of 

its bytes. As example, I will recover K17 from K16, any other byte of 

the key can be found from the previous one with the same method. 



Let's suppose we test 6Ch as candidate for K16 (which is the correct 

value) ; so we know the plaintext, the corresponding ciphertext, the first 

key byte and some values of n. The first thing to remember is that each key 

bit is modified after it has been used at step 5 (Ky = Ox). Therefore, 

when we're about to use K17, K16 has been modified into another value :

this new value is the first information to find.



Let's see how this happens :



  4. Ox = Ix ^ Ky                ; here we know Ix and Ky

  5. Ky = Ox                     ; here the key bit can be modified

  6. If n > 0.5 then Ox = ~Ox    ; we also know Ox after this step

                 

We know Ix and Ky at step 4, so we can calculate Ox at step 4, and therefore 

Ky at step 5. We also know Ox after step 6, it is the real output value, so we 

can deduce if n was greater than 0.5 at step 6.

 

Here's a summary of what we have : in the next table, the two first columns 

and the last one are known, the other ones can be deduced from the 3 previous 

equations : 



 plaintext      K16      temp. output   new key value     n        real output

(0h, 1h, 1h)   (6Ch)       (step 4)       (step 5)     (step 6)   (7h, 5h, 3Fh)

---------------------------------------------------------------------------

 I0(0)=0      K16(0)=0  =>  O0(0)=0   =>  K16(0)=0      n>0.5   =>  O0(0)=1 

 I0(1)=0      K16(1)=0  =>  O0(1)=0   =>  K16(1)=0      n>0.5   =>  O0(1)=1 

 I0(2)=0      K16(2)=1  =>  O0(2)=1   =>  K16(2)=1      n<0.5   =>  O0(2)=1 

 I0(3)=0      K16(3)=1  =>  O0(3)=1   =>  K16(3)=1      n>0.5   =>  O0(3)=0 



 I16(0)=1     K16(3)=1  =>  O16(0)=0  =>  K16(3)=0      n>0.5   =>  O16(0)=1 

 I16(1)=0     K16(4)=0  =>  O16(1)=0  =>  K16(4)=0      n<0.5   =>  O16(1)=0 

 I16(2)=0     K16(4)=0  =>  O16(2)=0  =>  K16(4)=0      n>0.5   =>  O16(2)=1 

 I16(3)=0     K16(5)=1  =>  O16(3)=1  =>  K16(5)=1      n>0.5   =>  O16(3)=0 



 I24(0)=1     K16(5)=1  =>  O24(0)=0  =>  K16(5)=0      n>0.5   =>  O24(0)=1 

 I24(1)=0     K16(6)=1  =>  O24(1)=1  =>  K16(6)=1      n<0.5   =>  O24(1)=1 

 I24(2)=0     K16(7)=0  =>  O24(2)=0  =>  K16(7)=0      n>0.5   =>  O24(2)=1 



Before we used K16 for our encryption, its value was 6Ch. From what we obtain 

in the fourth column, after we used K16, its new value is 44h. Therefore, 

before being used, K17 will be XORed with 44h at step 13. So we just have

to retrieve K17 by the same method we used in part III, and at the end XOR 

its value with 44h to obtain the real K17 value. Let's do it :



byte nb :      24              32              40



value   :      3Fh  _          05h  _          0Dh  _      

            0011 1111b      0000 0101b      0000 1101b



So for byte 24, we have :



bit 3:

   4. O24(3) = K17(0) 

   6. n = 0.5590 => O24(3) = ~O24(3) = 1   ===>   K17(0) = 0

   7. n = n + 0.333333

   9. n = Norm(f(f(n))) = 0.8043

  11. y = y + 1

 

bit 4:

   4. O24(4) = K17(1) 

   6. n = 0.8043 => O24(4) = ~O24(4) = 1   ===>   K17(1) = 0

   7. n = n + 0.333333

   9. n = Norm(f(f(n))) = 0.8718

  11. y = y + 1 



bit 5:

   4. O24(5) = K17(2) 

   6. n = 0.8718 => O24(5) = ~O24(5) = 1   ===>   K17(2) = 0

   7. n = n + 0.333333

   9. n = Norm(f(f(n))) = 0.8684

  11. y = y + 1 



... and so on until we find K17(7). At last, we obtain K17 = 28h, so that 

the real K17 value is 28h ^ 44h = 6Ch. Now we could find K18 from K17 with 

the same method, and then, one by one, all other bytes of the key.



Since we had only 8 candidates for K16 and 2 candidates for the initial 

n value, we have only 16 possible keys. The correct key among these 16 

candidates will appear by itself, because wrong keys will produce 

inconsistencies as we will decrypt more and more known plaintext. So this 

attack requires at worst 2*2^8= 512 attempts to retrieve the complete key, 

which takes a few minutes on a PC.





PART FIVE : A REAL LIFE ATTACK

You may think that the previous attack is just theoretical, since in real 

life I will never be able to choose what will be encrypted. Well, you're 

quite right, at least for a correct software implementation. But the same 

weaknesses can also be exploited for a ciphertext-only attack, as I will 

demonstrate now.



Before we start, let's think about what we used for the chosen-plaintext 

attack : the known plaintext-ciphertext pair, the key length and one known 

value of n. But the key length is not that necessary, and we could have 

just assumed that the key was the same size as the plaintext : then, some 

repetitive patterns would have appeared in this 'unfolded' key, and a period 

of these patterns would be the real key. Also, the plaintext could have been 

anything, as long as we knew one value of n. So what we actually need for 

this attack is a plaintext-ciphertext pair and a known n value.



Now let's say we found a encrypted .BST file somewhere and we want to decrypt 

it. First, we can look for '00' patterns, because we know that these bytes 

are also in the plaintext. Some file formats contains lots of '00' bytes at 

defined places, and are therefore easy to identify (.EXE, .TAR, .BMP, etc). 

From there, an attack can be prepared.



As an example, let's take a encrypted PE executable file. I have easily 

identified this format by looking at the '00' patterns in the file, in 

the header and also some big spaces that must be used as internal buffers. 

Here's the encrypted header of the file :



0000: 07 72 5d 00 97 93 01 00 34 00 00 00 e7 67 0d 00    .r].....4....g..

0010: d8 1a 00 00 00 00 00 00 40 05 00 00 00 00 00 00    ........@.......

0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................

0030: 00 00 00 00 00 00 00 00 00 00 00 00 80 0e 00 00    ................

0040: 1a 8d 93 0b 00 a4 60 fb 6f d8 0f 04 d9 a3 86 cf    ......`.o.......

0050: 1f 60 67 50 45 ce 25 f0 2e 47 aa ed 15 03 2c 87    .`gPE.%..G....,.

0060: 4e e0 23 df a7 50 31 f9 61 d4 6a a1 ea f6 11 6b    N.#..P1.a.j....k

0070: 66 cd 74 24 6c 42 69 02 d4 00 00 00 00 00 00 00    f.t$lBi.........

0080: 70 43 07 00 3c 6d 3a 00 57 2a b2 ae 3a 93 07 00    pC...m:.W*..:...



If we look at these bytes from a 'PE header' point of view, we can easily 

guess what these bytes actually are :

- bytes 00h-1Fh : the MZ header.  

- bytes 20h-3Fh : a sequence of '00' bytes, between the header itself and 

  the MZ entry point.

- bytes 40h-78h : the MZ stub (the small piece of code that displays the 

  string and quit) and the string itself ("This program needs Win32 blabla").

- bytes 80h-  : the real PE header 



Now if we look at these bytes from a 'BEST cryptanalysis' point of view, 

we have a sequence of '00' long enough to give us a known value for n, so 

if we knew the bytes starting at 40h, then we could start a known-plaintext 

attack. Hey, but all MZ code and data bytes (from 40h to 7Fh) are just 

prepended to the assembled code by the linker during compilation. That 

means that if we know which compiler has been used, then we have 64 bytes 

of known plaintext, and therefore we can recover 64 bytes of the key. 

Actually, we also have only 3 possibilities for the 6 first bytes of the 

PE header, so that we have 70 known bytes. So in a few tens attempts, we 

can retrieve 70 bytes of the key.



From here we can continue in two directions, finding more bytes of the 

plaintext and finding more bytes of the key. If the key is shorter than 70 

bytes, a pattern will appear in the 'unfolded' key we found ; in this case, 

our job is done. If the key is longer, we first could try to find its actual 

length. To test a candidate for the key length, we calculate which pieces 

of the ciphertext will be decrypted by these 70 bytes if the key length we 

try is the correct one. Then, we calculate the according n values and decrypt 

these pieces of ciphertext ; we obtain one or more pieces of 70 bytes long 

plaintext : if all these bytes does not contain any recognizable data resources 

(strings, lookup tables, ...) or some code, then we can assume that the key 

length we tested was not correct. Unless the program is packed or crypted, 

this method can give us some possible key length and some possible pieces 

of plaintext. 



The same kind of investigations can be performed on various file formats, 

and this way some informations can be recovered just from the ciphertexts.


Final Notes

Let's summarize our results : with a single text chosen-plaintext attack, 

we can recover the complete key (as a comparaison, the same attack against 

DES requires 2^47 plaintexts). In the worst case, this attack requires 512 

attempts and can be completed in a few minutes on a PC. Moreover, the same 

weaknesses can be used for ciphertext-only attacks, not only to identify 

the format of the original file, but also to retrieve pieces of the key. 

I conclude that the BEST encryption algorithm is not secure and should not 

be used.



The whole strength of the algorithm was based on the assumption that a 

chaotic function cannot be predicted ; when it proved to be false, the 

full algorithm collapsed. Yet, the risk of chaotic functions and the 

difference between chaos and randomness had already been very clearly 

explained by Scut in an old essay (fravia.org/crymaco.htm). Nowadays, 

many cryptographers are suspicious of chaos, and prefer not to use it 

when designing a new cipher. Maybe Chaos is just too difficult to 

control to be a reliable tool ?





+Spath.  (spath at iname dot com)



--

Interested in cryptanalysis ? read Casimir's essays on the great homepage

of Joe Peschel, at http://members.aol.com/jpeschel/index.htm



Ob Duh
I wont even bother explaining you that you should BUY this target program if you intend to use it for a longer period than the allowed one. Should you want to STEAL this software instead, you don't need to crack its protection scheme at all: you'll find it on most Warez sites, complete and already regged, farewell, don't come back.

You are deep inside fravia's page of reverse engineering, choose your way out:


redhomepage redlinks redsearch_forms red+ORC redhow to protect redacademy database
redreality cracking redhow to search redjavascript wars
redtools redanonymity academy redcocktails redantismut CGI-scripts redmail_fravia+
redIs reverse engineering legal?