Don’t want to read the theory? Just want to see if your password has been leaked. Click here or scroll down.
How websites store data
When you create an account on a website, the website stores your registration details on it’s SQL databases. Very few people, even within the company/website have direct access to the databases.
In a naive world, the database would contain your plaintext passwords. However, since there are hackers doing SQL injection attacks to dump the database data, it’s helpful to keep the password hashed/ encrypted. This would mean that even if someone has access to the table, he would see your username, email address, and hashed password, but not the plain-text password.Those who don’t know about hashing may wonder how does the website check if you are typing the correct password during login, if the site itself doesn’t know you password. Well, to understand that, you must understand what hashing is. You can read it up on wikipedia for a technical idea, but I’ll (grossly over-)simplify it for you.
Hashing is any operation which is easy in one direction, and difficult in reverse. For example, mixing two colors is easy, while finding out the constituent colors of a color mixture isn’t quite that easy. Multiplying two large (prime) numbers is easy, but given a huge prime number, it isn’t easy to find the two prime factors which multiplied result in that number.
Hashing example
Let’s say your password is “pass”, and there’s a hashing function f(x). Then,
f(“pass”) = d@A2qAawqq21109 (say).
Going the forward way is quite simple. On the other hand, figuring out the plain-text password from the hash (d@A2qAawqq21109) is almost impossible.
So, when you create an account and you type the password as “pass”, d@A2qAawqq21109 is stored in the database.When you login and type password as “pass”, the server hashes it, and it becomes “d@A2qAawqq21109”, which is matched with the SQL database. If you typed out some other password, say “ssap”, then the hash generated would be different, and you won’t be able to log in. Note that while the hashing function gives different outputs for most strings, every once in a while, there may be collisions (two strings may have the same hash). This is very very very rare, and shouldn’t be of any concern to us.
Forgot Your Password – Ever wondered why almost all websites give you a new password when you forget your old one, instead of just telling you your password. Well, now you know, it turns out that they themselves don’t know your password, and hence can’t tell you. When they offer you a chance to change your password, they just change the corresponding hash in their tables, and now your new password works.
How hashes are cracked – I wrote earlier that hash functions are easy to go one way, but almost impossible to go the other. The task of going the other way can be accomplished by bruteforce method. Basically, suppose someone had the password “pass”. Now, a hacker who only has access to the hashes can hash all the passwords in alphabetical order and then check which hash matches. (assume hacker knows password has length four and only alphabets).
He tries ‘aaaa’,’aaab’, ‘aaac’,……’aaba’, ‘aabb’ ,’aabc’,…..’aazz’ , ‘abaa’, ……………. ‘paaa’,’paab’,.. ,’pass’. When he tries ‘aaaa’, the hash is not d@A2qAawqq21109, it is something else. Till he reaches ‘pass’, he gets a hash which doesn’t match d@A2qAawqq21109. But for ‘pass’, the hash matches. So, the hacker now knows your password.
Website leaks
Due to the above reason, website leaks are bad, but not that bad. If the passwords are sufficiently complex, the hashing algorithm is secure, and salt (explained later) is used, then it’s quite unlikely that the hackers would be able to get many passwords from the database dump. So, even if Facebook DB is leaked, your passwords are most probably safe. Unfortunately, most probably is not something one can work with, especially when you have so much to loose in case the 0.1% chance of password being compromised is the one that materializes. So, after a DB leak, the website often asks all it’s users to change their passwords (eg. dropbox leak, linkedin leak, myspace leak etc.). Also, since you might be using the same password on different websites, it’s important that you change your password everywhere.
This isn’t even the worst part though. Some websites don’t hash your passwords, and store them in plain-text instead. If their database is leaked, the hacker has immediate access to millions of accounts on that website, plus possibly 10s of millions of accounts on other websites which use the same email/username – password combination.For example, 000webhost database had plain-text passwords, and it was leaked. I personally hosted a site there once, and my account was compromised as well.
But this still isn’t the worst part. The hackers often dump the databases publicly. The responsible ones let the website know that their security sucks, and asks them to inform their customers about the leak and get their passwords changed. After sufficient time is given to the website to act, the hacker would often dump the database publicly. To see the extent of this, take 000webhost’s example. The first search result for “000webhost leak” gives you the database, which you can download and see the passwords. The password I was using 3-4 years ago is there in the database. That very password is probably still there on some of the websites that I signed up for 3-4 years ago but haven’t you them since then (and hence didn’t update the password).
Problem 1 : Suppose there’s an hashing scheme X. Under that scheme, “pass” becomes d@A2qAawqq21109. Now this is a very secure scheme and every website uses it. Now, there’a guy who has a lot of computational power and he computes the hashes of all possible letter combinations under the scheme X. Now, given a hashed value, he can simply lookup/search his table and see what password does it correspond to. He makes this table of word to hash available online. Now, it’s quite easy to get the passwords from a database dump.
Problem 2 : Alternatively, even if the scheme isn’t common, what one can do is that he can take a common password, say “password”, then hash it, and then search all the users in the 100 million users password dump and see if any hash matches. If it does, then that means that the given user has the password “password”. By using 1 million common password, he’ll probably get 10% of the users password among the 100 million users.
Solution : Hashing Salt – To prevent that, each user chooses a password, and is given a random string, the hashing salt. The hashing function operates on both the password and the salt. So, if two users have same password, but different salts, then they’ll have different hashes. This renders both the above techniques/problems useless. Now, to get the correct hash, the hacker has to input the correct password and the correct salt to the hashing function. This means that –
The first problem where someone else pre-computed the password-hash table is solved, since now that person has to make password-salt-hash table (for every password and every salt combination, what’s the hash), which is going to be too many possible combinations. If there are 10 million possible passwords, and 10 million possible salts, there would be 100 million million combinations (I don’t even know what million million even is). If there are 10 common salts which are used very often, then the person can make a table with all the 10 million passwords hashed for the 10 common salts. Alternatively, the person can hash the 10 most common password with 10 million possible hashes. Thus, it’s important to have both strong passwords and random salts.
The second problem is also kind of solved, since the person would have to solve the hash of common passwords with each salt in the table (note that he doesn’t have to do it for all 10 million combinations, only the ones present in the table). Again, not using easy generic password like “password”,”hello”, etc. would solve this issue.
Weak salts? One of the flaws with hashing is that it could have weak salts. WPA/WPA-2 is quite robust, but since it used the SSID of the network as salt, the routers which use default SSID’s (“linksys”,”netgear”,etc.) are more vulnerable than others since rainbow tables exist which have hashes for most common passwords and most common SSIDs. That said, I’d like to re-iterate, WPA/WPA-2 is still quite damn secure, and I pointed this out only as a relevant example.
Out of all the leaks so far, I had accounts in 4 of the leaks. My account was there in the Myspace leak, the LinkedIn leak, the dropbox leak, and the 000webhost leak. I had to change my password on multiple sites on multiple occasions.
One way to find out if you’re compromised is to look for all the dumps and check manually if you’re in them. However, that’s practically impossible (not all dumps are public, and looking for your name/email in a huge file takes the computer more time than you’d guess). Fortunately, there’s a website which specifically exists for this purpose, known as LeakedSource. You can search using your email free of cost. They offer some extra functionality for pretty affordable rates ($4 paypal, $2 bitcoin).
I am compromised
If you find out that your account is indeed compromised, then I suggest you quickly change your password on all services that you use which have the same password. Better yet, change all your passwords. It’s good practice to keep changing your passwords regularly anyway. Also, if a website has the two step authentication feature, then it’s suggested that you use it.