{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The point of salting and hashing passwords, demonstrated with Python\n",
"\n",
"Story here: https://medium.com/p/4879a156d99a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When I first heard about this, I thought, \"hash browns?\" but nope!\n",
"\n",
"The concept of salting and hashing has to do with securing databases that contain login credentials, so that even *if* the hackers somehow gained access to the database, they still can't do much with the information.\n",
"\n",
"To demonstrate this, let's create a pandas dataframe, depicting an actual database, and add random login credentials of a few users."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2021-08-07T20:32:29.885735Z",
"start_time": "2021-08-07T20:32:29.595318Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" username | \n",
" password | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" apple | \n",
" apples_are_THE_be5t! | \n",
"
\n",
" \n",
" 1 | \n",
" banana | \n",
" password123 | \n",
"
\n",
" \n",
" 2 | \n",
" cherry | \n",
" password123 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" username password\n",
"0 apple apples_are_THE_be5t!\n",
"1 banana password123\n",
"2 cherry password123"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"\n",
"usernames = [\"apple\", \"banana\", \"cherry\"]\n",
"passwords = [\"apples_are_THE_be5t!\", \"password123\", \"password123\"]\n",
"\n",
"df = pd.DataFrame({\"username\": usernames, \"password\": passwords})\n",
"display(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, the password is stored insecurely as plain-text. If a hacker was able to access this database, they can see the actual password and readily use the credentials to login.\n",
"\n",
"One way to slowdown the hacker is to **hash** the password before storing it. Hashing converts some input into a fixed-size string. Providing the same input to the hashing algorithm will yield the same output, but hashing only goes one-way, meaning someone can't use the hashed output to back out the original input."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2021-08-07T20:32:29.898035Z",
"start_time": "2021-08-07T20:32:29.888142Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" username | \n",
" hash | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" apple | \n",
" f3520dba0533d4249de493cb54106a41dd03837fd056a3... | \n",
"
\n",
" \n",
" 1 | \n",
" banana | \n",
" ef92b778bafe771e89245b89ecbc08a44a4e166c066599... | \n",
"
\n",
" \n",
" 2 | \n",
" cherry | \n",
" ef92b778bafe771e89245b89ecbc08a44a4e166c066599... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" username hash\n",
"0 apple f3520dba0533d4249de493cb54106a41dd03837fd056a3...\n",
"1 banana ef92b778bafe771e89245b89ecbc08a44a4e166c066599...\n",
"2 cherry ef92b778bafe771e89245b89ecbc08a44a4e166c066599..."
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import hashlib\n",
"\n",
"import pandas as pd\n",
"\n",
"usernames = [\"apple\", \"banana\", \"cherry\"]\n",
"passwords = [\"apples_are_THE_be5t!\", \"password123\", \"password123\"]\n",
"\n",
"# hash the passwords\n",
"hashes = [\n",
" hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n",
" for password in passwords\n",
"]\n",
"\n",
"df = pd.DataFrame({\"username\": usernames, \"hash\": hashes})\n",
"display(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So how do authentication systems with hashed passwords work? Since these authentication systems no longer store original passwords, they can't just compare the user's input to the stored hash value to validate.\n",
"\n",
"They *can*, however, use the same hashing algorithm on the user's input--then compare whether the output hash matches the stored hash, because remember, the hashing algorithms are deterministic: the same input results in the same output. That is also why the banana and cherry users have the same hash.\n",
"\n",
"Now, if the hacker gained access to this database, they can't *immediately* use the hash to login because the \n",
"authentication system expects plain-text passwords and if the hacker provides the hash, that would rehash the hash, resulting in a different hash, and ultimately an invalid login.\n",
"\n",
"The hacker could try guessing random common passwords through the login page. However, if the authentication system detects too many failed attempts, the hacker could be locked out, and perhaps, even alert the security team.\n",
"\n",
"Instaed, hackers pre-compute hashes of common passwords and use that to back out the original password, as depicted below."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2021-08-07T20:32:29.932292Z",
"start_time": "2021-08-07T20:32:29.899775Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'precomputed'"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" password | \n",
" hash | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 123456789 | \n",
" 15e2b0d3c33891ebb0f1ef609ec419420c20e320ce94c6... | \n",
"
\n",
" \n",
" 1 | \n",
" password123 | \n",
" ef92b778bafe771e89245b89ecbc08a44a4e166c066599... | \n",
"
\n",
" \n",
" 2 | \n",
" qwerty123 | \n",
" daaad6e5604e8e17bd9f108d91e26afe6281dac8fda009... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" password hash\n",
"0 123456789 15e2b0d3c33891ebb0f1ef609ec419420c20e320ce94c6...\n",
"1 password123 ef92b778bafe771e89245b89ecbc08a44a4e166c066599...\n",
"2 qwerty123 daaad6e5604e8e17bd9f108d91e26afe6281dac8fda009..."
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"'compromised'"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" username | \n",
" hash | \n",
" password | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" banana | \n",
" ef92b778bafe771e89245b89ecbc08a44a4e166c066599... | \n",
" password123 | \n",
"
\n",
" \n",
" 1 | \n",
" cherry | \n",
" ef92b778bafe771e89245b89ecbc08a44a4e166c066599... | \n",
" password123 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" username hash password\n",
"0 banana ef92b778bafe771e89245b89ecbc08a44a4e166c066599... password123\n",
"1 cherry ef92b778bafe771e89245b89ecbc08a44a4e166c066599... password123"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"common_passwords = [\"123456789\", \"password123\", \"qwerty123\"]\n",
"common_hashes = [\n",
" hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n",
" for password in common_passwords\n",
"]\n",
"precomputed_df = pd.DataFrame({\"password\": common_passwords, \"hash\": common_hashes})\n",
"display(\"precomputed\", precomputed_df)\n",
"\n",
"compromised_df = pd.merge(df, precomputed_df, on=\"hash\")\n",
"display(\"compromised\", compromised_df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With just a few lines of code, the hacker has now easily compromised two of the users' credentials because they both used a common password!\n",
"\n",
"So how do authentication systems slowdown the hacker further and prevent multiple users from being compromised at once, even if the users used the same, common passwords?\n",
"\n",
"That is where salting comes to play. Salting simply means adding extra characters to the provided password before hashing."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2021-08-07T20:32:29.959308Z",
"start_time": "2021-08-07T20:32:29.934426Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'salted_passwords'"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"['apples_are_THE_be5t!9a2da2ab69bd17c1',\n",
" 'password12304e7485237cabba4',\n",
" 'password12376579477069b44c0']"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" username | \n",
" salt | \n",
" hash | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" apple | \n",
" 9a2da2ab69bd17c1 | \n",
" ddd17d2faebd8d0f1ba1598be050ee60e60193dfcfd414... | \n",
"
\n",
" \n",
" 1 | \n",
" banana | \n",
" 04e7485237cabba4 | \n",
" ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a... | \n",
"
\n",
" \n",
" 2 | \n",
" cherry | \n",
" 76579477069b44c0 | \n",
" 662b6e6ef167cae168c7ad423b0843abc452e536fbceec... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" username salt \\\n",
"0 apple 9a2da2ab69bd17c1 \n",
"1 banana 04e7485237cabba4 \n",
"2 cherry 76579477069b44c0 \n",
"\n",
" hash \n",
"0 ddd17d2faebd8d0f1ba1598be050ee60e60193dfcfd414... \n",
"1 ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a... \n",
"2 662b6e6ef167cae168c7ad423b0843abc452e536fbceec... "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import secrets\n",
"import hashlib\n",
"\n",
"import pandas as pd\n",
"\n",
"usernames = [\"apple\", \"banana\", \"cherry\"]\n",
"passwords = [\"apples_are_THE_be5t!\", \"password123\", \"password123\"]\n",
"salts = [secrets.token_hex(8) for i in range(len(passwords))]\n",
"\n",
"# salt the passwords\n",
"salted_passwords = [\n",
" password + salt for password, salt in zip(passwords, salts)\n",
"]\n",
"display(\"salted_passwords\", salted_passwords)\n",
"\n",
"# hash the passwords\n",
"hashes = [\n",
" hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n",
" for password in salted_passwords\n",
"]\n",
"\n",
"df = pd.DataFrame({\"username\": usernames, \"salt\": salts, \"hash\": hashes})\n",
"display(df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2021-08-07T19:50:32.373243Z",
"start_time": "2021-08-07T19:50:32.369000Z"
}
},
"source": [
"With this included, to authenticate, authentication systems now also would have to salt the the user's input before hashing and comparing.\n",
"\n",
"Also, notice, although the banana and cherry users used the same password, the stored hashes are now different, and that means, the hacker can't compromise both their credentials in one go!\n",
"\n",
"However salting isn't foolproof; it simply slows down the hacker. With sufficient time, the hacker can still loop through the common passwords, add the salt, hash the salted password, and back out the original password."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2021-08-07T20:32:29.992494Z",
"start_time": "2021-08-07T20:32:29.960931Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'precomputed'"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" password | \n",
" salted_password | \n",
" hash | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 123456789 | \n",
" 1234567899a2da2ab69bd17c1 | \n",
" 4d7c7e9f923bf0275e1d9c4ab10eaefaf4462ee6dead13... | \n",
"
\n",
" \n",
" 1 | \n",
" 123456789 | \n",
" 12345678904e7485237cabba4 | \n",
" 5a8fdee3760c6246fa4cc63f69f154da895fd7da42f712... | \n",
"
\n",
" \n",
" 2 | \n",
" 123456789 | \n",
" 12345678976579477069b44c0 | \n",
" d324fb639362b9e797068ff910dfd21f8aa19260e5a8fa... | \n",
"
\n",
" \n",
" 3 | \n",
" password123 | \n",
" password1239a2da2ab69bd17c1 | \n",
" 2e0fb1aefd60d95efd48ad6b2e76fed1186aa7167aef65... | \n",
"
\n",
" \n",
" 4 | \n",
" password123 | \n",
" password12304e7485237cabba4 | \n",
" ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a... | \n",
"
\n",
" \n",
" 5 | \n",
" password123 | \n",
" password12376579477069b44c0 | \n",
" 662b6e6ef167cae168c7ad423b0843abc452e536fbceec... | \n",
"
\n",
" \n",
" 6 | \n",
" qwerty123 | \n",
" qwerty1239a2da2ab69bd17c1 | \n",
" f245aa171123276958bcaf0a1cc97222c083455bf1fed4... | \n",
"
\n",
" \n",
" 7 | \n",
" qwerty123 | \n",
" qwerty12304e7485237cabba4 | \n",
" 038863cb22fd1574409cc7a7464d935ac7dd57a655998a... | \n",
"
\n",
" \n",
" 8 | \n",
" qwerty123 | \n",
" qwerty12376579477069b44c0 | \n",
" 3027110bafde96e31fe4594f93ecf28a91d9b91f9f92ea... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" password salted_password \\\n",
"0 123456789 1234567899a2da2ab69bd17c1 \n",
"1 123456789 12345678904e7485237cabba4 \n",
"2 123456789 12345678976579477069b44c0 \n",
"3 password123 password1239a2da2ab69bd17c1 \n",
"4 password123 password12304e7485237cabba4 \n",
"5 password123 password12376579477069b44c0 \n",
"6 qwerty123 qwerty1239a2da2ab69bd17c1 \n",
"7 qwerty123 qwerty12304e7485237cabba4 \n",
"8 qwerty123 qwerty12376579477069b44c0 \n",
"\n",
" hash \n",
"0 4d7c7e9f923bf0275e1d9c4ab10eaefaf4462ee6dead13... \n",
"1 5a8fdee3760c6246fa4cc63f69f154da895fd7da42f712... \n",
"2 d324fb639362b9e797068ff910dfd21f8aa19260e5a8fa... \n",
"3 2e0fb1aefd60d95efd48ad6b2e76fed1186aa7167aef65... \n",
"4 ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a... \n",
"5 662b6e6ef167cae168c7ad423b0843abc452e536fbceec... \n",
"6 f245aa171123276958bcaf0a1cc97222c083455bf1fed4... \n",
"7 038863cb22fd1574409cc7a7464d935ac7dd57a655998a... \n",
"8 3027110bafde96e31fe4594f93ecf28a91d9b91f9f92ea... "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"'compromised'"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" username | \n",
" salt | \n",
" hash | \n",
" password | \n",
" salted_password | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" banana | \n",
" 04e7485237cabba4 | \n",
" ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a... | \n",
" password123 | \n",
" password12304e7485237cabba4 | \n",
"
\n",
" \n",
" 1 | \n",
" cherry | \n",
" 76579477069b44c0 | \n",
" 662b6e6ef167cae168c7ad423b0843abc452e536fbceec... | \n",
" password123 | \n",
" password12376579477069b44c0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" username salt \\\n",
"0 banana 04e7485237cabba4 \n",
"1 cherry 76579477069b44c0 \n",
"\n",
" hash password \\\n",
"0 ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a... password123 \n",
"1 662b6e6ef167cae168c7ad423b0843abc452e536fbceec... password123 \n",
"\n",
" salted_password \n",
"0 password12304e7485237cabba4 \n",
"1 password12376579477069b44c0 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"common_passwords = [\"123456789\", \"password123\", \"qwerty123\"]\n",
"common_salted_passwords = {\n",
" password + salt: password for password in common_passwords for salt in df[\"salt\"]\n",
"}\n",
"common_hashes = [\n",
" hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n",
" for password in common_salted_passwords.keys()\n",
"]\n",
"\n",
"precomputed_df = pd.DataFrame(\n",
" {\n",
" \"password\": common_salted_passwords.values(),\n",
" \"salted_password\": common_salted_passwords.keys(),\n",
" \"hash\": common_hashes,\n",
" }\n",
")\n",
"display(\"precomputed\", precomputed_df)\n",
"\n",
"compromised_df = pd.merge(df, precomputed_df, on=\"hash\")\n",
"display(\"compromised\", compromised_df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, a caveat to this hacking method is if the hacker knew how the salt was added! The salt could have been appended, as in the case here, or prepended. The salt could also have been surrounded by other characters before being merged.\n",
"\n",
"In addition, the salting process could also have been repeated several, or more likely, thousands of times, to yield the final salted and hashed password, like below."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2021-08-07T20:32:30.013615Z",
"start_time": "2021-08-07T20:32:29.994387Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'salted_passwords'"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"['!apples_are_THE_be5t!...38b79f10f59ec5b4!apples_are_THE_be5t!...38b79f10f59ec5b4!apples_are_THE_be5t!...38b79f10f59ec5b4',\n",
" '!password123...a3a530e7fb19b972!password123...a3a530e7fb19b972!password123...a3a530e7fb19b972',\n",
" '!password123...42b320058343b990!password123...42b320058343b990!password123...42b320058343b990']"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" username | \n",
" salt | \n",
" hash | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" apple | \n",
" 38b79f10f59ec5b4 | \n",
" 9b4821c3df1b8eb99ca7a259990d48e198ec3a3721971e... | \n",
"
\n",
" \n",
" 1 | \n",
" banana | \n",
" a3a530e7fb19b972 | \n",
" 3028d394732e4e6831d8dc7c1bf79d4ce7f368d41088fb... | \n",
"
\n",
" \n",
" 2 | \n",
" cherry | \n",
" 42b320058343b990 | \n",
" c2fa3c9d10cf1de228f4e7e0593cc34fdb8cc1a2feb9f9... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" username salt \\\n",
"0 apple 38b79f10f59ec5b4 \n",
"1 banana a3a530e7fb19b972 \n",
"2 cherry 42b320058343b990 \n",
"\n",
" hash \n",
"0 9b4821c3df1b8eb99ca7a259990d48e198ec3a3721971e... \n",
"1 3028d394732e4e6831d8dc7c1bf79d4ce7f368d41088fb... \n",
"2 c2fa3c9d10cf1de228f4e7e0593cc34fdb8cc1a2feb9f9... "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import secrets\n",
"import hashlib\n",
"\n",
"import pandas as pd\n",
"\n",
"usernames = [\"apple\", \"banana\", \"cherry\"]\n",
"passwords = [\"apples_are_THE_be5t!\", \"password123\", \"password123\"]\n",
"salts = [secrets.token_hex(8) for i in range(len(passwords))]\n",
"\n",
"# salt the passwords with a special recipe\n",
"salted_passwords = [\n",
" f\"!{password}...{salt}\" * 3 for password, salt in zip(passwords, salts)\n",
"]\n",
"display(\"salted_passwords\", salted_passwords)\n",
"\n",
"# hash the passwords\n",
"hashes = [\n",
" hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n",
" for password in salted_passwords\n",
"]\n",
"\n",
"df = pd.DataFrame({\"username\": usernames, \"salt\": salts, \"hash\": hashes})\n",
"display(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With this implemented, the hacker now needs to know how the salt was added, and also how many times!\n",
"\n",
"But I'm sure if the hacker had enough time and commitment, they would find a way to crack this!\n",
"\n",
"To conclude:\n",
"- hashing converts the input password into a fixed-size string\n",
"- hashes are deterministic; same input returns the same output\n",
"- hashes are one-way; the hash cannot be used directly to back out the original input\n",
"\n",
"- salting adds extra characters to the input password\n",
"- salts can be added in unique ways; appended, prepended, surrounded, etc\n",
"- salts can be added many times; likely thousands of times\n",
"\n",
"TLDR:\n",
"- hashes help prevent the hacker from immediately knowing the plain-text password\n",
"- salts help prevent the hacker from knowing that two or more users use the same password\n",
"- both are techniques used to slowdown the hacker from compromising credentials\n",
"\n",
"And this is the extent of my knowledge. I only recently learned about these concepts, and have much more to learn about it, so please feel free to share your knowledge in the comments!\n",
"\n",
"Thanks for reading!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}