Let's imagine there are two users working in our system. A good one and a bad one. The good user enters invoice data. Then the bad one changes it. Just one line. For examle:
something 110 pcs. X 101$ = 11110$
is transformed into
something 101 pcs. X 110$ = 11110$
It's very difficult to notice what's changed, isn't it? However, now the bad user can steal nine pieces without attracting much attention.
How can the good user protect themselves from the actions of the bad user? Usually at this moment people say:
Well, we have version control system. We can see everything that's happened to this document.
Where is the mistake here? In the word "can". In order to see something, we need to know where to look. Let's imagine we have data on 10 000 documents in 15 000 different versions. There is, for example, one document and 10 versions of it. It looks suspicious, but the bad user didn't make 10 corrections, they made just one. We have to check all the documents with at least one correction and we have many such documents. We could check all the documents where the amount and the price changed places as a result of a correction. Still, we wouldn't be able to distinguish between the actions of the bad user and an ordinary human error. Did they fix a mistake so that now everything is correct? Or did they make a new mistake on purpose? There is no other way to find out what really happened but to check the original document.
Let's suppose the good user decides to check documents not only when entering data, but when any correction is made. What does it mean for the bad user? Just that they have to be a little more careful.They have already found a way to change the documents without being asked questions by the supplier. Now they need to find a way to change the documents so that nobody inside their company could notice the changes.
It is not as difficult as it seems. In fact, the bad user works with one or several records in the database. At some moment, one value is changed to another and there is no trace of this. The way modern databases work makes it possible. The only weak obstacle is authorization system. It doesn't allow some users to make changes, it allows others to do so, but it records the changes in one way or another. In the simplest case, it just marks the document as changed. This is what we call passive security.
What is its weakness? There is always at least one user, more often several of them, with exclusive rights. Moreover, in any system there are several ways to get exclusive rights. That is to say a passive security system always has some back doors. Some of them are open because not many are aware of their existence (our bad user can belong to this small circle). Others are known, but aren't being closed for various reasons. The most importnat hing here is that there is no reason to hope that the bad user lets us know our passive security doesn't work.
A passive security should work together with an active one. It means someone has to ensure that a databse is constantly in touch with reality. Documents are checked when they are entered and rechecked each time they are corrected. What is the fundamental difficulty? Let's suppose we have a document marked as changed. The good user checks it and... removes the "changed" tag. Of course they have to do this. Otherwise they will have to check it again and again. Hence, if the good user can remove the "changed" tag, why can't the bad user do the same? So, we can't rely on "changed" tags and we have only one option left. Each time we have to check every single document in the database, which is impossible. Where is the mistake? In the word "impossible".
For quite some time there have been relatively simple and reliable technologies which, with correct integration, allow to control any amount of data.
The first ingredient of our "meal" is hash function. Hash function transformes data of arbitrary size into a line of fixed length. Sounds simple, but hash functions themselves aren't that simple. In our case we have to turn gigabytes into a relatively short line which can be controlled with a naked eye. How do you turn a billion of bytes into 32? Approaching it simply, we could just leave randomly chosen 32 bytes and cast aside the rest. Hash function works in a more elaborate way. Before casting aside the unnecessary, the data is carefully mixed, transformed into real hash. What's more, this hash is not just perfect. It is absolutely, fantastically perfect. Imagine a high-quality photo. You change just one bit in this photo. One single pixel has slightly changed its color. Now you have two virtually indistinguishable photos. If you use the simple approach, you are highly likely to get the same 32 bytes as a result. However, hash function, with a likelihood that can be considered equal to 1, will give you different 32-byte sets. Significantly different. So different that it will be seen with a naked eye.
Now we can get the result of hash function (hereafter just hash) for all our databse. After some time we will get hash again and we will see whether any changes were made. For now it doesn't help us much. The majority of data changes quickly. We know that something has changed in our database without hash. Let's add the second ingredient.
Let's get back to our invoices. Let's imagine we have just started to fill our database. We take the first invoice, check it and record it. Then we compute the hash of the document and make a record with a link to the document and the hash in a journal. Technically, the journal can be a part of our database or a separate database. It isn't important for security. We take the second document, check it and record it. And then we join this document to the hash of the previous one and compute hash for the two of them. We record the result in our journal. We will do this with any new document. As a result, we get what is called blockchain.
Blockchain is a technology which became widely known after the appearance of cryptocurrencies, even though it had existed long before that. How does blockchain journal help us? Each hash corresponds to its document. We can check this correspondence at any moment. At the same time, each hash corresponds to all the previous documents. This means that it is impossible to simply change a record in the journal. It will be necessary to compute all the hashes again starting with the changed record and to the end, and this is easy to discover.
We have only one problem left to solve. What to do when a document which is already a part of the database is changed? The solution is pretty simple. We check all the documents. We compare newly computed hashes with the hashes in the blockchain. A discrepancy means that the document has been changed. In this case we don't change the record in the blockchain, we add a new one at the end. This record will contain a link to the same document and a newly computed hash.
What did the good user get? Exactly what they wanted. They have a journal where all new documents and all changed documents are recorded. It is impossible to change a document without the good user noticing. Any changes will be visible in the journal. The only option left for the bad user is to substitute the journal itself. However, this won't be successful either, as the good user always has the latest hash, and, hence, they will notice the substitution immediately.
Summing it up, a journal based on blockchain allows to organise visual human control over any amount of data. It can be one person controlling all the database or a group of auditors with each member controlling their part. Active security of a databse ensures constant reliable connection between your data and reality and it should definetely be used together with passive security.