The confusion of the message abstract

recently, I encountered a little problem when I was learning web development. About the abstract algorithm, I still don"t understand it after checking the data. It is as follows:

Baidu encyclopedia has this article:

generally speaking, as long as the input message is different, the summary message generated by summarizing it must be different, but the same input must produce the same output. This is the nature of a good message digest algorithm: when the input changes, so does the output; the abstracts of two similar messages are not similar, or even quite different

what I wonder is that since the same input corresponds to the same output, does not a password correspond to a ciphertext generated according to the algorithm, such as me:

import hashlib
pwd = "abc".encode("ascii")
s = hashlib.sha1(pwd)
s.hexdigest()

s -sharp -> "a9993e364706816aba3e25717850c26c9cd0d89d"

so if a bad person knows that it is the sha1 algorithm and the last s ciphertext, can he infer that the password is abc ? For example, many people"s passwords may be simple birthdays (like 900101), so can"t you guess that person"s password by conjecture, experiment and comparison?

there is still a big deviation in understanding when I come into contact with this kind of knowledge for the first time. I hope you can answer and give me your advice!


first, the hash algorithm is not an encryption algorithm, it has no reverse statement.
second, the role of a hash is to map arbitrary input to a fixed-length result, that is,

abcas... (omitting 2000 characters), the result generated by a hash algorithm is 123456789 ;
abc , and the result generated by the same hash algorithm is also 123456789 .

this is very likely, so with 123456789 and the hashing algorithm used, you can't even determine the length of the original text, let alone the content of the original text.


@ krun explains the common sense of hashing algorithm very clearly. To this day, some students still call MD5 and SHA1 "encryption" algorithm.

Let me add some useful information here.

in the early days, the passwords of many websites were stored in the database using MD5, and outsiders got the database through infiltration means (such as SQL injection attacks), but there was no clear password.
at this time, MD5 search engines appeared on the Internet, such as

.

these sites try to find out the original text based on the MD5 summary you provide.

thanks to the substantial improvement in computer computing power, coupled with the later rainbow table

makes the reverse operation of MD5 digest faster and faster. There are even hardware devices made by FPGA.
therefore, it is very insecure to use only one MD5 operation to store passwords.


hash algorithm is not an encryption algorithm that does not have a key and generally cannot be reversed.

A simple example, if I tell you:

def hash(s):
    return s[-1]

and the hash value'a', you can't figure out what the original s is, so you can't reason backwards.

but won't it be cracked? not necessarily.
here you can immediately think that s can calculate the same hash value as long as it is any string that ends with a.
even if you don't know the algorithm, it can be the same several times, this is because the algorithm is so simple.
the general hash algorithm is much more complex than this, and it is generally impossible to get an effective result in a limited time. Some of the standards previously set by
can also be collided with valid results with the improvement of computer computing power, and others have the application of quantum computers.

Menu