{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The point of salting and hashing passwords, demonstrated with Python\n", "\n", "Story here: https://medium.com/p/4879a156d99a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When I first heard about this, I thought, \"hash browns?\" but nope!\n", "\n", "The concept of salting and hashing has to do with securing databases that contain login credentials, so that even *if* the hackers somehow gained access to the database, they still can't do much with the information.\n", "\n", "To demonstrate this, let's create a pandas dataframe, depicting an actual database, and add random login credentials of a few users." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2021-08-07T20:32:29.885735Z", "start_time": "2021-08-07T20:32:29.595318Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
usernamepassword
0appleapples_are_THE_be5t!
1bananapassword123
2cherrypassword123
\n", "
" ], "text/plain": [ " username password\n", "0 apple apples_are_THE_be5t!\n", "1 banana password123\n", "2 cherry password123" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "\n", "usernames = [\"apple\", \"banana\", \"cherry\"]\n", "passwords = [\"apples_are_THE_be5t!\", \"password123\", \"password123\"]\n", "\n", "df = pd.DataFrame({\"username\": usernames, \"password\": passwords})\n", "display(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, the password is stored insecurely as plain-text. If a hacker was able to access this database, they can see the actual password and readily use the credentials to login.\n", "\n", "One way to slowdown the hacker is to **hash** the password before storing it. Hashing converts some input into a fixed-size string. Providing the same input to the hashing algorithm will yield the same output, but hashing only goes one-way, meaning someone can't use the hashed output to back out the original input." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-08-07T20:32:29.898035Z", "start_time": "2021-08-07T20:32:29.888142Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
usernamehash
0applef3520dba0533d4249de493cb54106a41dd03837fd056a3...
1bananaef92b778bafe771e89245b89ecbc08a44a4e166c066599...
2cherryef92b778bafe771e89245b89ecbc08a44a4e166c066599...
\n", "
" ], "text/plain": [ " username hash\n", "0 apple f3520dba0533d4249de493cb54106a41dd03837fd056a3...\n", "1 banana ef92b778bafe771e89245b89ecbc08a44a4e166c066599...\n", "2 cherry ef92b778bafe771e89245b89ecbc08a44a4e166c066599..." ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import hashlib\n", "\n", "import pandas as pd\n", "\n", "usernames = [\"apple\", \"banana\", \"cherry\"]\n", "passwords = [\"apples_are_THE_be5t!\", \"password123\", \"password123\"]\n", "\n", "# hash the passwords\n", "hashes = [\n", " hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n", " for password in passwords\n", "]\n", "\n", "df = pd.DataFrame({\"username\": usernames, \"hash\": hashes})\n", "display(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So how do authentication systems with hashed passwords work? Since these authentication systems no longer store original passwords, they can't just compare the user's input to the stored hash value to validate.\n", "\n", "They *can*, however, use the same hashing algorithm on the user's input--then compare whether the output hash matches the stored hash, because remember, the hashing algorithms are deterministic: the same input results in the same output. That is also why the banana and cherry users have the same hash.\n", "\n", "Now, if the hacker gained access to this database, they can't *immediately* use the hash to login because the \n", "authentication system expects plain-text passwords and if the hacker provides the hash, that would rehash the hash, resulting in a different hash, and ultimately an invalid login.\n", "\n", "The hacker could try guessing random common passwords through the login page. However, if the authentication system detects too many failed attempts, the hacker could be locked out, and perhaps, even alert the security team.\n", "\n", "Instaed, hackers pre-compute hashes of common passwords and use that to back out the original password, as depicted below." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2021-08-07T20:32:29.932292Z", "start_time": "2021-08-07T20:32:29.899775Z" } }, "outputs": [ { "data": { "text/plain": [ "'precomputed'" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
passwordhash
012345678915e2b0d3c33891ebb0f1ef609ec419420c20e320ce94c6...
1password123ef92b778bafe771e89245b89ecbc08a44a4e166c066599...
2qwerty123daaad6e5604e8e17bd9f108d91e26afe6281dac8fda009...
\n", "
" ], "text/plain": [ " password hash\n", "0 123456789 15e2b0d3c33891ebb0f1ef609ec419420c20e320ce94c6...\n", "1 password123 ef92b778bafe771e89245b89ecbc08a44a4e166c066599...\n", "2 qwerty123 daaad6e5604e8e17bd9f108d91e26afe6281dac8fda009..." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "'compromised'" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
usernamehashpassword
0bananaef92b778bafe771e89245b89ecbc08a44a4e166c066599...password123
1cherryef92b778bafe771e89245b89ecbc08a44a4e166c066599...password123
\n", "
" ], "text/plain": [ " username hash password\n", "0 banana ef92b778bafe771e89245b89ecbc08a44a4e166c066599... password123\n", "1 cherry ef92b778bafe771e89245b89ecbc08a44a4e166c066599... password123" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "common_passwords = [\"123456789\", \"password123\", \"qwerty123\"]\n", "common_hashes = [\n", " hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n", " for password in common_passwords\n", "]\n", "precomputed_df = pd.DataFrame({\"password\": common_passwords, \"hash\": common_hashes})\n", "display(\"precomputed\", precomputed_df)\n", "\n", "compromised_df = pd.merge(df, precomputed_df, on=\"hash\")\n", "display(\"compromised\", compromised_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With just a few lines of code, the hacker has now easily compromised two of the users' credentials because they both used a common password!\n", "\n", "So how do authentication systems slowdown the hacker further and prevent multiple users from being compromised at once, even if the users used the same, common passwords?\n", "\n", "That is where salting comes to play. Salting simply means adding extra characters to the provided password before hashing." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2021-08-07T20:32:29.959308Z", "start_time": "2021-08-07T20:32:29.934426Z" } }, "outputs": [ { "data": { "text/plain": [ "'salted_passwords'" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "['apples_are_THE_be5t!9a2da2ab69bd17c1',\n", " 'password12304e7485237cabba4',\n", " 'password12376579477069b44c0']" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
usernamesalthash
0apple9a2da2ab69bd17c1ddd17d2faebd8d0f1ba1598be050ee60e60193dfcfd414...
1banana04e7485237cabba4ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a...
2cherry76579477069b44c0662b6e6ef167cae168c7ad423b0843abc452e536fbceec...
\n", "
" ], "text/plain": [ " username salt \\\n", "0 apple 9a2da2ab69bd17c1 \n", "1 banana 04e7485237cabba4 \n", "2 cherry 76579477069b44c0 \n", "\n", " hash \n", "0 ddd17d2faebd8d0f1ba1598be050ee60e60193dfcfd414... \n", "1 ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a... \n", "2 662b6e6ef167cae168c7ad423b0843abc452e536fbceec... " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import secrets\n", "import hashlib\n", "\n", "import pandas as pd\n", "\n", "usernames = [\"apple\", \"banana\", \"cherry\"]\n", "passwords = [\"apples_are_THE_be5t!\", \"password123\", \"password123\"]\n", "salts = [secrets.token_hex(8) for i in range(len(passwords))]\n", "\n", "# salt the passwords\n", "salted_passwords = [\n", " password + salt for password, salt in zip(passwords, salts)\n", "]\n", "display(\"salted_passwords\", salted_passwords)\n", "\n", "# hash the passwords\n", "hashes = [\n", " hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n", " for password in salted_passwords\n", "]\n", "\n", "df = pd.DataFrame({\"username\": usernames, \"salt\": salts, \"hash\": hashes})\n", "display(df)" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2021-08-07T19:50:32.373243Z", "start_time": "2021-08-07T19:50:32.369000Z" } }, "source": [ "With this included, to authenticate, authentication systems now also would have to salt the the user's input before hashing and comparing.\n", "\n", "Also, notice, although the banana and cherry users used the same password, the stored hashes are now different, and that means, the hacker can't compromise both their credentials in one go!\n", "\n", "However salting isn't foolproof; it simply slows down the hacker. With sufficient time, the hacker can still loop through the common passwords, add the salt, hash the salted password, and back out the original password." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2021-08-07T20:32:29.992494Z", "start_time": "2021-08-07T20:32:29.960931Z" } }, "outputs": [ { "data": { "text/plain": [ "'precomputed'" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
passwordsalted_passwordhash
01234567891234567899a2da2ab69bd17c14d7c7e9f923bf0275e1d9c4ab10eaefaf4462ee6dead13...
112345678912345678904e7485237cabba45a8fdee3760c6246fa4cc63f69f154da895fd7da42f712...
212345678912345678976579477069b44c0d324fb639362b9e797068ff910dfd21f8aa19260e5a8fa...
3password123password1239a2da2ab69bd17c12e0fb1aefd60d95efd48ad6b2e76fed1186aa7167aef65...
4password123password12304e7485237cabba4ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a...
5password123password12376579477069b44c0662b6e6ef167cae168c7ad423b0843abc452e536fbceec...
6qwerty123qwerty1239a2da2ab69bd17c1f245aa171123276958bcaf0a1cc97222c083455bf1fed4...
7qwerty123qwerty12304e7485237cabba4038863cb22fd1574409cc7a7464d935ac7dd57a655998a...
8qwerty123qwerty12376579477069b44c03027110bafde96e31fe4594f93ecf28a91d9b91f9f92ea...
\n", "
" ], "text/plain": [ " password salted_password \\\n", "0 123456789 1234567899a2da2ab69bd17c1 \n", "1 123456789 12345678904e7485237cabba4 \n", "2 123456789 12345678976579477069b44c0 \n", "3 password123 password1239a2da2ab69bd17c1 \n", "4 password123 password12304e7485237cabba4 \n", "5 password123 password12376579477069b44c0 \n", "6 qwerty123 qwerty1239a2da2ab69bd17c1 \n", "7 qwerty123 qwerty12304e7485237cabba4 \n", "8 qwerty123 qwerty12376579477069b44c0 \n", "\n", " hash \n", "0 4d7c7e9f923bf0275e1d9c4ab10eaefaf4462ee6dead13... \n", "1 5a8fdee3760c6246fa4cc63f69f154da895fd7da42f712... \n", "2 d324fb639362b9e797068ff910dfd21f8aa19260e5a8fa... \n", "3 2e0fb1aefd60d95efd48ad6b2e76fed1186aa7167aef65... \n", "4 ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a... \n", "5 662b6e6ef167cae168c7ad423b0843abc452e536fbceec... \n", "6 f245aa171123276958bcaf0a1cc97222c083455bf1fed4... \n", "7 038863cb22fd1574409cc7a7464d935ac7dd57a655998a... \n", "8 3027110bafde96e31fe4594f93ecf28a91d9b91f9f92ea... " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "'compromised'" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
usernamesalthashpasswordsalted_password
0banana04e7485237cabba4ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a...password123password12304e7485237cabba4
1cherry76579477069b44c0662b6e6ef167cae168c7ad423b0843abc452e536fbceec...password123password12376579477069b44c0
\n", "
" ], "text/plain": [ " username salt \\\n", "0 banana 04e7485237cabba4 \n", "1 cherry 76579477069b44c0 \n", "\n", " hash password \\\n", "0 ac7656830c64631a6e8f15b4690a1257096bcc0e52f58a... password123 \n", "1 662b6e6ef167cae168c7ad423b0843abc452e536fbceec... password123 \n", "\n", " salted_password \n", "0 password12304e7485237cabba4 \n", "1 password12376579477069b44c0 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "common_passwords = [\"123456789\", \"password123\", \"qwerty123\"]\n", "common_salted_passwords = {\n", " password + salt: password for password in common_passwords for salt in df[\"salt\"]\n", "}\n", "common_hashes = [\n", " hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n", " for password in common_salted_passwords.keys()\n", "]\n", "\n", "precomputed_df = pd.DataFrame(\n", " {\n", " \"password\": common_salted_passwords.values(),\n", " \"salted_password\": common_salted_passwords.keys(),\n", " \"hash\": common_hashes,\n", " }\n", ")\n", "display(\"precomputed\", precomputed_df)\n", "\n", "compromised_df = pd.merge(df, precomputed_df, on=\"hash\")\n", "display(\"compromised\", compromised_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, a caveat to this hacking method is if the hacker knew how the salt was added! The salt could have been appended, as in the case here, or prepended. The salt could also have been surrounded by other characters before being merged.\n", "\n", "In addition, the salting process could also have been repeated several, or more likely, thousands of times, to yield the final salted and hashed password, like below." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2021-08-07T20:32:30.013615Z", "start_time": "2021-08-07T20:32:29.994387Z" } }, "outputs": [ { "data": { "text/plain": [ "'salted_passwords'" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "['!apples_are_THE_be5t!...38b79f10f59ec5b4!apples_are_THE_be5t!...38b79f10f59ec5b4!apples_are_THE_be5t!...38b79f10f59ec5b4',\n", " '!password123...a3a530e7fb19b972!password123...a3a530e7fb19b972!password123...a3a530e7fb19b972',\n", " '!password123...42b320058343b990!password123...42b320058343b990!password123...42b320058343b990']" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
usernamesalthash
0apple38b79f10f59ec5b49b4821c3df1b8eb99ca7a259990d48e198ec3a3721971e...
1bananaa3a530e7fb19b9723028d394732e4e6831d8dc7c1bf79d4ce7f368d41088fb...
2cherry42b320058343b990c2fa3c9d10cf1de228f4e7e0593cc34fdb8cc1a2feb9f9...
\n", "
" ], "text/plain": [ " username salt \\\n", "0 apple 38b79f10f59ec5b4 \n", "1 banana a3a530e7fb19b972 \n", "2 cherry 42b320058343b990 \n", "\n", " hash \n", "0 9b4821c3df1b8eb99ca7a259990d48e198ec3a3721971e... \n", "1 3028d394732e4e6831d8dc7c1bf79d4ce7f368d41088fb... \n", "2 c2fa3c9d10cf1de228f4e7e0593cc34fdb8cc1a2feb9f9... " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import secrets\n", "import hashlib\n", "\n", "import pandas as pd\n", "\n", "usernames = [\"apple\", \"banana\", \"cherry\"]\n", "passwords = [\"apples_are_THE_be5t!\", \"password123\", \"password123\"]\n", "salts = [secrets.token_hex(8) for i in range(len(passwords))]\n", "\n", "# salt the passwords with a special recipe\n", "salted_passwords = [\n", " f\"!{password}...{salt}\" * 3 for password, salt in zip(passwords, salts)\n", "]\n", "display(\"salted_passwords\", salted_passwords)\n", "\n", "# hash the passwords\n", "hashes = [\n", " hashlib.sha256(password.encode(\"utf-8\")).hexdigest()\n", " for password in salted_passwords\n", "]\n", "\n", "df = pd.DataFrame({\"username\": usernames, \"salt\": salts, \"hash\": hashes})\n", "display(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this implemented, the hacker now needs to know how the salt was added, and also how many times!\n", "\n", "But I'm sure if the hacker had enough time and commitment, they would find a way to crack this!\n", "\n", "To conclude:\n", "- hashing converts the input password into a fixed-size string\n", "- hashes are deterministic; same input returns the same output\n", "- hashes are one-way; the hash cannot be used directly to back out the original input\n", "\n", "- salting adds extra characters to the input password\n", "- salts can be added in unique ways; appended, prepended, surrounded, etc\n", "- salts can be added many times; likely thousands of times\n", "\n", "TLDR:\n", "- hashes help prevent the hacker from immediately knowing the plain-text password\n", "- salts help prevent the hacker from knowing that two or more users use the same password\n", "- both are techniques used to slowdown the hacker from compromising credentials\n", "\n", "And this is the extent of my knowledge. I only recently learned about these concepts, and have much more to learn about it, so please feel free to share your knowledge in the comments!\n", "\n", "Thanks for reading!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }