The point of salting and hashing passwords, demonstrated with Python¶

Story here: https://medium.com/p/4879a156d99a

When I first heard about this, I thought, “hash browns?” but nope!

The concept of salting and hashing has to do with securing databases that contain login credentials, so that even if the hackers somehow gained access to the database, they still can’t do much with the information.

To demonstrate this, let’s create a pandas dataframe, depicting an actual database, and add random login credentials of a few users.

[1]:

import pandas as pd

usernames = ["apple", "banana", "cherry"]
passwords = ["apples_are_THE_be5t!", "password123", "password123"]

df = pd.DataFrame({"username": usernames, "password": passwords})
display(df)

	username	password
0	apple	apples_are_THE_be5t!
1	banana	password123
2	cherry	password123

Here, the password is stored insecurely as plain-text. If a hacker was able to access this database, they can see the actual password and readily use the credentials to login.

One way to slowdown the hacker is to hash the password before storing it. Hashing converts some input into a fixed-size string. Providing the same input to the hashing algorithm will yield the same output, but hashing only goes one-way, meaning someone can’t use the hashed output to back out the original input.

[2]:

import hashlib

import pandas as pd

usernames = ["apple", "banana", "cherry"]
passwords = ["apples_are_THE_be5t!", "password123", "password123"]

# hash the passwords
hashes = [
    hashlib.sha256(password.encode("utf-8")).hexdigest()
    for password in passwords
]

df = pd.DataFrame({"username": usernames, "hash": hashes})
display(df)

	username	hash
0	apple	f3520dba0533d4249de493cb54106a41dd03837fd056a3...
1	banana	ef92b778bafe771e89245b89ecbc08a44a4e166c066599...
2	cherry	ef92b778bafe771e89245b89ecbc08a44a4e166c066599...

So how do authentication systems with hashed passwords work? Since these authentication systems no longer store original passwords, they can’t just compare the user’s input to the stored hash value to validate.

They can, however, use the same hashing algorithm on the user’s input–then compare whether the output hash matches the stored hash, because remember, the hashing algorithms are deterministic: the same input results in the same output. That is also why the banana and cherry users have the same hash.

Now, if the hacker gained access to this database, they can’t immediately use the hash to login because the authentication system expects plain-text passwords and if the hacker provides the hash, that would rehash the hash, resulting in a different hash, and ultimately an invalid login.

The hacker could try guessing random common passwords through the login page. However, if the authentication system detects too many failed attempts, the hacker could be locked out, and perhaps, even alert the security team.

Instaed, hackers pre-compute hashes of common passwords and use that to back out the original password, as depicted below.

[3]:

common_passwords = ["123456789", "password123", "qwerty123"]
common_hashes = [
    hashlib.sha256(password.encode("utf-8")).hexdigest()
    for password in common_passwords
]
precomputed_df = pd.DataFrame({"password": common_passwords, "hash": common_hashes})
display("precomputed", precomputed_df)

compromised_df = pd.merge(df, precomputed_df, on="hash")
display("compromised", compromised_df)

'precomputed'

	password	hash
0	123456789	15e2b0d3c33891ebb0f1ef609ec419420c20e320ce94c6...
1	password123	ef92b778bafe771e89245b89ecbc08a44a4e166c066599...
2	qwerty123	daaad6e5604e8e17bd9f108d91e26afe6281dac8fda009...

'compromised'

	username	hash	password
0	banana	ef92b778bafe771e89245b89ecbc08a44a4e166c066599...	password123
1	cherry	ef92b778bafe771e89245b89ecbc08a44a4e166c066599...	password123

With just a few lines of code, the hacker has now easily compromised two of the users’ credentials because they both used a common password!

So how do authentication systems slowdown the hacker further and prevent multiple users from being compromised at once, even if the users used the same, common passwords?

That is where salting comes to play. Salting simply means adding extra characters to the provided password before hashing.

[4]:

import secrets
import hashlib

import pandas as pd

usernames = ["apple", "banana", "cherry"]
passwords = ["apples_are_THE_be5t!", "password123", "password123"]
salts = [secrets.token_hex(8) for i in range(len(passwords))]

# salt the passwords
salted_passwords = [
    password + salt for password, salt in zip(passwords, salts)
]
display("salted_passwords", salted_passwords)

# hash the passwords
hashes = [
    hashlib.sha256(password.encode("utf-8")).hexdigest()
    for password in salted_passwords
]

df = pd.DataFrame({"username": usernames, "salt": salts, "hash": hashes})
display(df)

'salted_passwords'

['apples_are_THE_be5t!9a2da2ab69bd17c1',
 'password12304e7485237cabba4',
 'password12376579477069b44c0']

	username	salt	hash
0	apple	9a2da2ab69bd17c1	ddd17d2faebd8d0f1ba1598be050ee60e60193dfcfd414...
1	banana	04e7485237cabba4	ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a...
2	cherry	76579477069b44c0	662b6e6ef167cae168c7ad423b0843abc452e536fbceec...

With this included, to authenticate, authentication systems now also would have to salt the the user’s input before hashing and comparing.

Also, notice, although the banana and cherry users used the same password, the stored hashes are now different, and that means, the hacker can’t compromise both their credentials in one go!

However salting isn’t foolproof; it simply slows down the hacker. With sufficient time, the hacker can still loop through the common passwords, add the salt, hash the salted password, and back out the original password.

[5]:

common_passwords = ["123456789", "password123", "qwerty123"]
common_salted_passwords = {
    password + salt: password for password in common_passwords for salt in df["salt"]
}
common_hashes = [
    hashlib.sha256(password.encode("utf-8")).hexdigest()
    for password in common_salted_passwords.keys()
]

precomputed_df = pd.DataFrame(
    {
        "password": common_salted_passwords.values(),
        "salted_password": common_salted_passwords.keys(),
        "hash": common_hashes,
    }
)
display("precomputed", precomputed_df)

compromised_df = pd.merge(df, precomputed_df, on="hash")
display("compromised", compromised_df)

'precomputed'

	password	salted_password	hash
0	123456789	1234567899a2da2ab69bd17c1	4d7c7e9f923bf0275e1d9c4ab10eaefaf4462ee6dead13...
1	123456789	12345678904e7485237cabba4	5a8fdee3760c6246fa4cc63f69f154da895fd7da42f712...
2	123456789	12345678976579477069b44c0	d324fb639362b9e797068ff910dfd21f8aa19260e5a8fa...
3	password123	password1239a2da2ab69bd17c1	2e0fb1aefd60d95efd48ad6b2e76fed1186aa7167aef65...
4	password123	password12304e7485237cabba4	ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a...
5	password123	password12376579477069b44c0	662b6e6ef167cae168c7ad423b0843abc452e536fbceec...
6	qwerty123	qwerty1239a2da2ab69bd17c1	f245aa171123276958bcaf0a1cc97222c083455bf1fed4...
7	qwerty123	qwerty12304e7485237cabba4	038863cb22fd1574409cc7a7464d935ac7dd57a655998a...
8	qwerty123	qwerty12376579477069b44c0	3027110bafde96e31fe4594f93ecf28a91d9b91f9f92ea...

'compromised'

	username	salt	hash	password	salted_password
0	banana	04e7485237cabba4	ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a...	password123	password12304e7485237cabba4
1	cherry	76579477069b44c0	662b6e6ef167cae168c7ad423b0843abc452e536fbceec...	password123	password12376579477069b44c0

However, a caveat to this hacking method is if the hacker knew how the salt was added! The salt could have been appended, as in the case here, or prepended. The salt could also have been surrounded by other characters before being merged.

In addition, the salting process could also have been repeated several, or more likely, thousands of times, to yield the final salted and hashed password, like below.

[6]:

import secrets
import hashlib

import pandas as pd

usernames = ["apple", "banana", "cherry"]
passwords = ["apples_are_THE_be5t!", "password123", "password123"]
salts = [secrets.token_hex(8) for i in range(len(passwords))]

# salt the passwords with a special recipe
salted_passwords = [
    f"!{password}...{salt}" * 3 for password, salt in zip(passwords, salts)
]
display("salted_passwords", salted_passwords)

# hash the passwords
hashes = [
    hashlib.sha256(password.encode("utf-8")).hexdigest()
    for password in salted_passwords
]

df = pd.DataFrame({"username": usernames, "salt": salts, "hash": hashes})
display(df)

'salted_passwords'

['!apples_are_THE_be5t!...38b79f10f59ec5b4!apples_are_THE_be5t!...38b79f10f59ec5b4!apples_are_THE_be5t!...38b79f10f59ec5b4',
 '!password123...a3a530e7fb19b972!password123...a3a530e7fb19b972!password123...a3a530e7fb19b972',
 '!password123...42b320058343b990!password123...42b320058343b990!password123...42b320058343b990']

	username	salt	hash
0	apple	38b79f10f59ec5b4	9b4821c3df1b8eb99ca7a259990d48e198ec3a3721971e...
1	banana	a3a530e7fb19b972	3028d394732e4e6831d8dc7c1bf79d4ce7f368d41088fb...
2	cherry	42b320058343b990	c2fa3c9d10cf1de228f4e7e0593cc34fdb8cc1a2feb9f9...

With this implemented, the hacker now needs to know how the salt was added, and also how many times!

But I’m sure if the hacker had enough time and commitment, they would find a way to crack this!

To conclude: - hashing converts the input password into a fixed-size string - hashes are deterministic; same input returns the same output - hashes are one-way; the hash cannot be used directly to back out the original input

salting adds extra characters to the input password
salts can be added in unique ways; appended, prepended, surrounded, etc
salts can be added many times; likely thousands of times

TLDR: - hashes help prevent the hacker from immediately knowing the plain-text password - salts help prevent the hacker from knowing that two or more users use the same password - both are techniques used to slowdown the hacker from compromising credentials

And this is the extent of my knowledge. I only recently learned about these concepts, and have much more to learn about it, so please feel free to share your knowledge in the comments!

Thanks for reading!

pYdeas 0.0.0 documentation

The point of salting and hashing passwords, demonstrated with Python¶