In one of our practice papers we are asked the question in the title.
Most of the articles I've read say that indexing improves performance of joins, but does not tell me how.
Maybe it's so obvious that it doesn't need to be stated. Indexing is essentially ordering a column right? So I guess having a column in order makes it easier to operate on. Is there more to it? Or am I overthinking it?
Most used are B-tree based indexes. From Oracle Database Online Documentation 12c:
B-trees, short for balanced trees, are the most common type of database index. A B-tree index is an ordered list of values divided into ranges. By associating a key with a row or range of rows, B-trees provide excellent retrieval performance for a wide range of queries, including exact match and range searches.
Here is a simplified discussion of the answer.
In most Relational database implementations, the physical order of the rows assumes the order those rows were inserted in. So if you have a products table and you insert products with keys 1, 8, 2, 3, 12, chances are that the records will physically be stored in that order. When you run a SQL query to get the rows, you may get the rows in yet possibly a different order unless you specify ORDER BY keyName Ascending/Descending.
When you create an index on a column, the database creates a physically separate store for the indexed values. This store, has the entries placed in sorted order (either ascending or descending) as @marc_s commented.
In the above example, the index will contain entries in this physical order: 1,2,3,8 and 12.
This index structure provides several benefits for queries: 1-The structure is much smaller than the corresponding data structure, this makes scanning the entire index less demanding on storage. 2-The entries are sorted, so responding to requests with ORDER BY returning multiple rows is straight forward and does nor require further sorting. 3-The entries are sorted again, this helps finding a specific key value in the index structure. If the records are not sorted you would need an average of N/2 comparisons before you get a hit, whereas if the the index is used you would need about log(N) comparisons only depending on the algorithm utilized (see for example: Wiki-Binary Search.
When you perform a query on the database involving a column that has an index, the database engine employs an optimization algorithm that tells it if is good to use the index structure or not and then selects the best approach to get your data accordingly.
Indexes are not all good. Some drawbacks: 1-Index structure take space 2-Index maintenance takes processing time. 3-Index entries need to be created with every insert, update or delete. 4-The more indexes you create, the less speedy the insert processing becomes. 5-In some databases, index structure can get corrupted. 6-Index performance depends on key data type and length. Indexing a 5000 character field is not very good idea, whereas integers are very efficient. 7-Not all queries can be served well by indexes.
Some references for you: