SHA1 Log File Details

I have files on my computer system which carry legal significance.  I am working on a project which has patent applications and may have to prove in a court of law that the files I have deposited portraying my work have been carried out by me.  Files can be stored in file depositories which carry date stamps.  When legal disagreements occur can I show that they are mine?  A log file of hashes stored in a bank vault can match file in a file depository.


What do hashes represent?  A hash is taken of a file.  If the file gets altered there is a very high probability (near certainty) that the hash of the file will change.  Thus a hash can check with a high probability whether a file is different to the file that is expected.

When hashes are used to identify a file there is a high probability that the hash may not be of the file you think it is of.  An SHA1 hash is 20 bytes in size.  There are 1.461501637330903e+048 possibilities. Thus a 20 byte file also has has 1.461501637330903e+048 possibilities.  Some arrangements have greater probability and some arrangements have low probability of occurring.  There are hundreds of different file sizes, hence there is a very high probability of two files having the same hash.  This probability increases as the file size gets bigger. A hash needs to be such:

  1. That not only it is not obvious to what file it belongs to
  2. It must have very much reduced probability that another file can exist and be found with the same hash.
  This is the purpose of the Double SHA1 it has a reduced probability that another file has the same hash.

The double SHA1 hash are two hashes.  Multiple files can have the first hash.  Multiple files can have the second hash. But how many files can have the first hash and the second hash both at the same time?  A negligible amount.

Single SHA1 log file (possibility of multiple files having same hash):

Double SHA1 log file (zero or very low possibility of multiple files having the same hash):

This requires the taking of two SHA1 hashes.

  1. The first SHA1 hash is the same as the hash taken for the "Single SHA1 log file".
  2. The second SHA1 hash is taken after the contents of the file are subjected to processing in memory.
  3. The file is loaded into memory and the value of every byte in that file is increased by one.
  4.  An SHA1 hash is taken of this.  This is the second SHA1 hash

SHA1 hash is based on the NIST SHA-1 hash.  The basis of the hash worked out by the NSA is that it is not possible to identify the contents of a file by its hash.  A very important facility in the making of and the use of passwords in encryption.  Thus if you cannot identify File A then you cannot identify File B which has the same contents as File A with every byte increased by one.

You can multiple files with first SHA1 hash.  You can have multiple files with the second SHA1 hash. Files where both first and second SHA1 hash are the same is small or non existent.

Hard disks are not infallible.  They can partially fail causing the modification of what is on the hard disk.  An operating system or program on the operating system may have a software glitch due to a power supply spike which in turn may modify a file.  There are all sorts of programs on the internet which will deliberately and sneakily modify files without the computer user knowing this is happening.  Some of these programs will modify the file and link the file details to another part of the disk with false file particulars so that the user looking at these particulars are unaware that anything has changed.  A log file which compares contents with a hash value will indicate if a change has taken place if it has.

Example of contents of "1log" file
1log structure directory\filename-tab-file size (4 or 8 bytes) 20 byte SHA1 0D0A
Example of contents of "d1log" file
d1log structure directory\filename-tab-file size (4 or 8 bytes) 40 byte SHA1 0D0A


Log files created can be checked against the directory at a later date by the "Compare log File and Directory" functions.  The log files are for non Unicode named files.  You can have many files with the same hash.  I have a "pdf" file (I have SHA-1 hash of it).  I want to find another "pdf" file with the same pdf structure but different wording (and same SHA-1 hash).  The nature of the SHA-1 hash algorithm makes the probability of finding another such file within a reasonable time scale very low.  However, if I do not care what the file contents are (it can be totally random and nonsensical) then the probability of finding another file with the same SHA-1 hash within a reasonable time frame is more likely.

https://en.wikipedia.org/wiki/Probability states: "Probability is the measure of the likelihood that an event will occur.[1] See glossary of probability and statistics. Probability is quantified as a number between 0 and 1, where, loosely speaking,[2] 0 indicates impossibility and 1 indicates certainty.[3][4] The higher the probability of an event, the more likely it is that the event will occur. A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair, the two outcomes ("heads" and "tails") are both equally probable; the probability of "heads" equals the probability of "tails"; and since no other outcomes are possible, the probability of either "heads" or "tails" is 1/2 (which could also be written as 0.5 or 50%)."

note: numbers within [] are references on the web page.









As computers get more powerful and more skillful algorithms are developed.  It becomes easier to find two files with the same hash.  It is for this reason that NSA gives recommendations as what hashes need replacement in computer usage.

MD4 which was developed in the 1990's.  This hash is severely compromised.  It is still used on the KAD network.

http://www.bishopfox.com/resources/tools/other-free-tools/md4md5-collision-code/ says:

Create MD4 and MD5 hash collisions using groundbreaking new code that improves upon the techniques originally developed by Xiaoyun Wang.  Using a 1.6 GHz Pentium 4, MD5 collisionscan be generated in an average of 45 minutes, and MD4 collisions can be generated in an average of 5 seconds. Originally released on 22Jun2006.

SHA-0 was a replacement by NSA

https://link.springer.com/chapter/10.1007%2F978-3-540-71039-4_2 says:

"Collisions on SHA-0 in One Hour - [Stéphane Manuel Thomas Peyrin in a conference paper]
At Crypto 2007, Joux and Peyrin showed that the boomerang attack, a classical tool in block cipher cryptanalysis, can also be very useful when analyzing hash functions. They applied their new theoretical results to SHA and provided new improvements for the cryptanalysis of this algorithm. In this paper, we concentrate on the case of SHA-0. First, we show that the previous perturbation vectors used in all known attacks are not optimal and we provide a new 2-block one. The problem of the possible existence of message modifications for this vector is tackled by the utilization of auxiliary differentials from the boomerang attack, relatively simple to use. Finally, we are able to produce the best collision attack against SHA-0 so far, with a measured complexity of 233,6 hash function calls. Finding one collision for SHA-0 takes us approximatively one hour of computation on an average PC."


SHA-1 was a replacement by NSA to SHA-0

https://www.theregister.co.uk/2017/02/23/google_first_sha1_collision/ says:

"Now researchers at CWI Amsterdam and bods at Google have managed to alter a PDF without changing its SHA-1 hash value. That makes it a lot easier to pass off the meddled-with version as the legit copy. You could alter the contents of, say, a contract, and make its hash match that of the original. Now you can trick someone into thinking the tampered copy is the original. The hashes are completely the same."

The above took an large amount of computing power.  It specially targeted a particular file structure.  The file structure in this case would have been read as a pdf and no fault was found on the pdf when reading it.

If the attacker does not care if the file can be read or not then the amount of computing power needed to produce a file with the same SHA-1 is much reduced and is within the ability of a modern computer to find a duplicate SHA-1.  This can enable an attacker to cast doubt on what the original file was in the case of a legal dispute.  It can also enable an attacker to break the peer to peer file sharing system.

The double SHA1 is immune to the above attack.  A single SHA1 is vulnerable to casting doubt on determining what the original file was.  And certainly vulnerable to having another file with the same SHA1 replacing it.  It is not possible to find two different files with the same Double SHA1.

Modern computer systems which use hashes to verify a password have a problem with multiple passwords can have the same calculated hash.  Encryption which use a hash to encrypt and decrypt can have multiple passwords with the same hash.

A double SHA1 may make the original file easier to find and verify compared to SHA-1.  The double SHA1 is designed to help ensure a file being verifiable.  Encryption which uses a double SHA1 hash to encrypt and decrypt can have only one password with the same hash.  SHA-1 is 20 bytes in size.  Double SHA1 is 40 bytes in size.  40 byte brute force decryption is orders of magnitude much more difficult than 20 byte brute force decryption.