Data structures at their core are simply ways to store and manipulate data in a computer. It provides a framework to organise the 1s and 0s into something humans can understand. In this issue, we’re going to explore how data is stored and the 2 fundamental data structures that are needed to build the rest — the array and the linked list.
What is a Data structure?
Systems nowadays are usually 32 or 64-bit, meaning there are 32–64 individual 1s and 0s to represent each character. For simplicity in this article, we’ll call each group of 32/64 bits a memory cell.
To access a stored character, we’ll need a reference to their address in physical memory. This is a very fast process because the computer knows exactly where to go to find the data.
Unfortunately, individual memory cells are quite limited in storage, and to reference anything bigger, we’ll need to group these cells together. Data structures are these memory cells stored together in a particular order so that they can solve real-world problems.
Contiguous Memory
The easiest way to solve the storage problem is to have the neighbouring cells also be part of our data (so the computer can access them as a group). This allows us to store lists of numbers, characters or objects and is what we would conventionally call an array.
Access is incredibly fast on any element in the list, as the computer knows how far along the address is from the reference point. This is why in code, when we want to get an element from an array, we also pass in the offset from that memory address.
However inserting and removing elements from an array can be an issue, as it takes many operations to make room or fill the hole made when inserting and deleting elements.
In the same vein if you are running out of memory, extending an array can be difficult as it requires all cells to be next to each other!
The only way to do this is to copy the entire array into a new area with a larger size.
Arrays are very beneficial when needing to access and navigate memory cells due to their simple structure and isolated memory blocks. However more work is required when creating or modifying the array, due to the need for contiguous memory. That is the trade-off we make for exceptional read performance.
Non-Contiguous Memory
The other way to store data is by associating the next memory address after every element so that we always know where the next cell is. This similarly allows us to store lists of elements but with different characteristics for access and modification. We call this data structure the linked list.
This data structure allows memory to be stored in smaller chunks around the neighbourhood. Accessing a specific element in a linked list is a little harder, as we still only have a reference to the starting element. A computer has to crawl through each element from the start until it finds the element it is looking for.
The benefit of this structure is that adding and removing elements is very easy once you have an element, as you can simply change where the memory addresses point to.
Similarly, if we need to extend the length of our linked list we can do so without much issue - just add a reference to the new memory address at the end of the list! As linked lists don’t require a large set of contiguous memory to be set aside, there is never a need to copy arrays or rejig elements to maintain the data structure.
Building on top of these data structures
Data structures are simply ways we organise and store data, you may have seen other structures such as stacks, queues, dictionaries and trees. Under the hood, they are all combinations and specific implementations of arrays and linked lists.
I encourage you to imagine how this is possible. Can you recognise how other data structures implement themselves with only arrays and linked lists?
I hope you enjoyed this article, if you have any questions or comments please leave it below and follow to see more content like this.