Pseudo-Random Number Generator for Cryptography
In the High Dimensionality Pseudo Random Number Generators article I showed a simple pseudo random number generator (PRNG), that could produce very long sequence with very high randomness complexity. While these generators have very good statistical properties, it is a fairly easy task to reverse engineer the PRNG internal state by examining the output sequence. As such it is useless for use in encryption, such as being used directly as a stream cipher to encrypt a message. Here I investigate how to make these generators secure in such a way that they can be used for as a stream cipher. I am not attempting to address the issue of randomness requirements for state initialization of cyptographic systems in general, which generally require an external physical random source of some sort. Here I only examine how PRNG outputs can be made to be unpredictable so that they can not be reverse engineered without doing a brute-force search of all parameter possibilities.
Why are linear PRNGs so insecure?
It is well know that linear shift feedback register (LSFR) and linear congruential generators (LCG) are insecure. Given only a small sample of sequence values, it is possible to determine the underlying parameters and then predict future sequence values. The fundamental problem is that the only obscuring factor of PRNGs is the modulus. For example, if we take a look at the simplest PRNG:
$X_(i+1)=a*X_i mod p$
We see that this can produce random looking output, however it is a simple process to deduce the parameter $a$ and then be able to predict all future output. For this simple case shown here $a$ is simply $a = X_(i+1)*X_i^(p-2)$ While the problem becomes more complicated as we add more parameters and delays, it is still a relatively easy process to recover the original parameters. Here is one paper that outlines an algorithm to do this Shift-Register Synthesis and BCH Decoding - James L. Massey.
The main problem is that all operations with linear PRNGs are easily invertible. In fact, for the matrix based PRNG discussed previously, we go out of our way to make PRNG generating matrix invertible. It has to be invertible in order to produce maximal length sequences guaranteed to cover every possible vector combination. The advantage of our linear system is that it is easy to find maximal length generating matrices. The disadvantage is that we can reverse engineer the generating matrix given a fairly small number of sequence values.
In order to make our PRNG secure, we need to add a process to it that is not invertible. Since all linear operations are invertible our new process will also necessarily be a non-linear process. We can combine two or more linear PRNGs in a non-linear way to produce a secure PRNG. Here we investigate possible ways to do that.
The best approach is to use the PRNG to generate a sequence and to encrypt the output using a separate key and a secure algorithm such as AES. This way, even if the PRNG output is predictable, the encrypted output will be secure as long as the key remains secret. In this way we can build a secure stream cipher that allows us to access large encrypted files in a random access manner.