papers
+HCU papers

Little essay about the various methods and viewpoints of crunching
~ version June 1998 ~
by Joa

Courtesy of fravia's page of reverse engineering

> If you signal me, that this was understandable and want
> more, you will - i promise

Well, I do signal you that this was understandable and that I want more!
In fact I hope that you'll deal us some good cards about "their delicate and secured data" and about the "lots of algorithms" used inside the black boxes... because we want to have some more light inside all those black and dark boxes :-)



         Little essay about the various methods and viewpoints of crunching.



                                    Part I: Introduction





By Joa Koester (JoKo2000(at)hotmail(point)com) 

This essay is free to be copied and read and spread as long as

you are decent enough to let my name in it. 

If you have corrections, comments, better examples than the ones used 

in this essay, etc. - drop me a line.





But on with the essay...



as i recently visited fravia+'s page on the net, i was amazed about the 

knowledge packed together, the philosophy and the attidude contained 

in the writings. But i missed a little bit the **other side**. 

That is, us programmers, condemned to write software for banks, 

insurance companys etc., so they can make a lot of money, ripping

a lot of people of. These companies are often serious about data

hiding and are always eager to have their delicate data secured.

One way of securing data is crunching (packing) them. This has two valid 

points:



 - the data is secure (coz the crunching algorithm is not

       made public)

 - the data is more compact, meaning it can be easier transfered;

       the risk of a disc producing a read/write error, vaporising

	   personal data is definitevly lowered, when the data only

	   takes 50Kbyte on a disk rather than 200KByte (of course

	   if a read write error happens exactly in these 50KByte,

	   the file is also gone ;)



This brings us to the question:



WHAT IS CRUNCHING?





Well, a pseudo-mathematical answer would be:

everything that reduces a certain amount of data in its

SPACE, but not in it's INFORMATION (lossless crunching

that is, there are some quality (information) reducing 

algorithms out there - jpeg for example).





So we have some data IN:



AAAAACCCCCDDDDDDCCCCCefg





and we have a black box:



  /-------\

/           \

| Holy crap |

| happening |

|   here    |

-------------



and we have more or less unreadable data OUT:



@$/)!%A3yfg





So, what's the principle of the black box? 

Is there one Super-Algorithm that you are not allowed

to know ("Behold my son, tis knowledge is not 

for your eyes") ?



Of course not. There are principles. And there are lots

of algorithms, not just one.

And you ALREADY know a lot of these priciples 

from the everyday live.





When you have to cross a heavy-driven road you more or

less wait at the traffic light to become green. You stand 

there and see a red light. Maybe the light is shaped like

a standing person or a cross or something else. But it is red

and you already know that this means: Hey, stop, don't go.

If you go, you are violating three different traffic laws at least,

obstruct the cars and impose the next world war. Besides

you put your life in danger ;) 

And all this information just in the little red light on 

the other side of the street.

Are all red lights this way? No.

If you, for instance, are a musician and you are about to

record a song, you will press record & play on your 

recorder and a RED light will show up, telling you, that

you better not make thousands of mistakes and record 

what you are doing properly. The red light can be a circle,

a square, even a text, it doesn't matter. But it will be

red!!!





Dr. Watson, what do you think?



Well...

What we have here is a case of crunching:

The various informations are condensed into few different

symbols as possible. In both examples, only one symbol 

(the red light) is needed to get the MEANING (the information) 

transmitted. 



Right, Watson, right. And could we switch the information

contained in the symbols? That is, making the red light on

the recorder telling us, when to stop before a traffic light?



No. They are both red lights, that's true. But the red light

in the recorder has nothing to do with crossing a road and

the traffic light has nothing to do with us recording a song.

The CONTEXT is different. The only thing that is similar is

the color.



Hm, hm.

Condensing information (much source symbols -> less destination

symbols) and keeping the CONTEXT in mind. Sounds pretty 

good to me. 



Kind of 

	switch (context)

	{

		case traffic:

			if (red_Light) {...} else {...}

			break;



		case recording_music:

			if (red_Light) {...} else {...}

			break;



		default:

			No_Condensed_Symbols();

			break;

	}

ain't it?



In all crunching we will always have something that will 

tell us, in which context we will have to switch and because

of this we will know how the next following symbol(s) is/are 

to be interpreted.





Dr. Watson, are all interpretations dependend on only one symbol?



Hm, i would say no. There may be cases, where this is true,

but in most cases, there are more than one symbol defining

exactly, what's going on. There are crossroads where are

streets leading straight ahead and right and there are

crossroads where cars will drive left or straight ahead or

right. This will depend on which part of the crossroad the

car stands, so that the traffic for straight ahead can go

but the traffic for the right has still to wait for THEIR 

specific traffic light to switch from red to green. Another

example would be the position of the light. If the position

of the red light and the green light would be switched, 

there would be some chaos, i bet.



Sounds resonable, Watson. You say, that there are symbols

for a general context which are finetuned thru other symbols

defining the exact context to be interpreted? 



Exactly.



But what do you think how it is possible that all people

know that they have to stop on a red sign and go on a green

one?



Well, i would say that they know, because someone told them.

The parents, perhaps. Or they are taught so in the school.





In fact, to crunch and decrunch information correctly, both,

the sender and the receiver have to use the same way of 

interpreting the data. Society has ruled that a red traffic

light is a STOP. And so traffic lights will switch to red

when it is time to stop the traffic for one direction. And

on the other side YOU get taught by the society that you

better not run across the street when you have a red or else

you play 'Frogger' for real...

So put in one sentence - Both the sender and the receiver

use the same MODEL of data-interpretation.





Dr. Watson, what if i would like to crunch the information

transmitted in the red traffic light?



This would be nearly impossible. The whole meanings of 

what the traffic light means is already emitted in only

one symbol (the red light i mean now). There is a point

where the number of informations can't be reduced any

more without getting complications elsewhere. 

Imagine one would pack all three lights (red, green

and yellow) into one light that would change it's color,

depending on it's actual state. Ok, you would have less

metal, only one light to look at and less glass to clean.

The normal case would be condensed - not in interpretation

but in material. The routine of Green - Yellow - Red - 

Yello - Green... would stay. So far so good. But traffic

lights have the ability to send more complex signals also.

When, for example, there is a traffic-jam ahead and the

police notices this, they can (at least where i live) achieve

that the traffic lights green and yellow will blink together

to signal an upcoming jam so that the drivers can react to 

this signal. When all lights would be build in one case,

one would have to think of a special flashing / flashing

in a special speed or something like that. Not very

good. Especially for older drivers whose reaction times may

be slower - they would concentrate more on interpreting the

flashing signal than on the traffic itself increasing the 

risk of producing an accident. One other point would be the

shape of the light. A standing man in red and a walking man

in green would mean a complex shape of glass illuminated with

a complex circuitry. This would mean, if one part would activate

falsely, you would have, for example, a red man standing with

one green leg walking. Very confusing, eh? So condensing one thing

over the point of information content (also known as 

ENTROPY) on it's maximum leads to enlarging other parts giving

them biiiig overhead. How do we know that this process is worth

doing all this?



Well, a certain student once came up with exactly this question

and he answered it by himself: It depends on the probability of

the certain symbols. If some symbols are statistically so often

in our stream of perception (analyzing, reading buffer data, etc.)

that we can condense them enough that, even with the enlargement

of the other symbols (which but are not so often) we have an

overall crunching than it's worth it. The name of the student

was Huffman...

For example, you have:

aaaaaaaaaabbbbbcccdd (10 x a, 5 x b, 3 x c, 2 x d) 20 chars



then you would have 20 x 8 bits = 160 bits.



If you now would transmit the 'a' with 7 and the 'c' and 'd'

chars with 9 bits you would have	10 x 7 = 70

									 5 x 8 = 40

									 3 x 9 = 27

									 2 x 9 = 18

									         __

											155



So we would save 5 Bits. Not much, but better than nothing.

Of course you have to come up with an optimized coding for

these values as you wouldn't want to calculate by hand, which

char you should with which number of bits without confusing with

the handling of the other chars. But Huffman found a perfect

algorithm giving you the optimized value-table for your chars.

(but this is for the next part ;)







To condense the intro a little bit:



- To crunch is a way to transmit a certain amount of information

    in less symbols normally needed.

- How you crunch/decrunch depends on the actual context of 

	data actually received

- Both, the sender and the receiver will build up the same

	way of interpreting the data, building up the same model

- When transforming long information-symbols to shorter packages of

	symbols and thus reducing the output, we will face the case

	that there will some (hopefully seldom) symbols getting transformed

	into LONGER symbols. If we have totally random data crunching happens

	also totally random, making our affords nil. 

	That is BTW the reason why packing an already packed zip or rar 

	file is in almost all cases useless - the data written by those

	packers is nearly perfect random. 





I hope you enjoyed this intro. If you signal me, that this was 

understandable and want more, you will - i promise.



Greetings from a programmer



Joa




redhomepage redlinks red anonymity red+ORC redstudents' essays redacademy database
redtools redcounter measures redcocktails redantismut redbots wars redsearch_forms redmail_fravia
redIs reverse engineering legal?