Python Data structures form the backbone of computer programs and enable efficient data organization and processing. Python, a versatile and dynamically typed language, has a variety of built-in Python data structures.
Table of Contents
This article aims to provide a comprehensive guide to Python data structures, from the basics to the advanced ones, and highlight their properties and practical applications.
List of Python Data Structures
Python data structures are ways of storing and organizing data in Python. Depending on your use case, they allow you to access and manipulate data efficiently. The following is are list of Python data structures:
1). Python Specific Data Structures
1. Lists:
Lists are dynamic arrays, allowing for the storage of elements of different data types. They are ordered and mutable, meaning you can change their contents after creation.
Lists are created with square brackets []
or the list()
constructor.
You can add, remove, or change items in a list using methods like append()
, pop()
, or insert()
. Lists are useful for storing ordered collections of data that may change over time.
Operations on lists include indexing, slicing, appending, extending, and more. For example:
my_list = [1, 2, 3, 4, 5] print(my_list[2]) # Output: 3
Lists are versatile and widely used in Python for tasks ranging from simple data storage to complex algorithm implementations.
2. Tuples
Tuples are similar to lists but are immutable, making them suitable for situations where the data should not be modified.
Tuples are created with parentheses ()
or the tuple()
constructor. You cannot modify items in a tuple, but you can access them using indexing or slicing.
Tuples are useful for storing fixed collections of data that need to be protected from accidental changes, such as coordinates or RGB color values.
my_tuple = (1, 2, 3)
Tuples are often used in situations where data should remain constant throughout the program execution.
3. Sets
Sets are unordered collections of unique elements. They are useful for tasks that involve testing membership and eliminating duplicate entries.
A mutable collection of unique values that can be of any type. Sets are created with curly braces {}
or the set()
constructor.
You can add, remove, or change items in a set using methods like add()
, discard()
, or pop()
. You can also perform mathematical operations on sets, such as union, intersection, or difference.
Sets are useful for storing unordered collections of data that need to be deduplicated or compared.
Sets support various set operations like union, intersection, difference, and symmetric difference.
set1 = {1, 2, 3} set2 = {3, 4, 5} union_set = set1 | set2 # Output: {1, 2, 3, 4, 5}
Sets are efficient for data manipulation tasks that involve distinct elements.
4. Dictionaries
Dictionaries are key-value pairs, allowing for fast retrieval of values based on keys. They are often used for data representation, configuration settings, and caching.
A mutable mapping of keys to values that can be of any type. Dictionaries are created with curly braces {}
or the dict()
constructor.
You can add, remove, or change key-value pairs in a dictionary using methods like update()
, pop()
, or setdefault()
. You can access values in a dictionary using square brackets []
or the get()
method.
Dictionaries are useful for storing associative data that can be looked up by a unique key.
my_dict = {'name': 'John', 'age': 30, 'city': 'New York'} print(my_dict['age']) # Output: 30
Dictionaries provide a flexible and efficient way to organize and access data, especially in scenarios where a quick lookup is essential.
2). General Data Structures
Above are Python Data Structures. Now, let’s learn about some General Data Structures.
General data structures include linear ones like arrays, linked lists, stacks, and queues, as well as non-linear structures like trees, graphs, heaps, hash tables, and tries, each serving specific purposes in organizing and manipulating data efficiently.
1. Stacks and Queues
Stacks and queues are abstract data types commonly used in algorithms and data processing.
A stack is a linear data structure that follows the LIFO (last-in, first-out) principle. It means that the last element added to the stack is the first one to be removed.
You can think of a stack as a pile of books, where you can only add or remove books from the top of the pile.
Stacks are useful for implementing algorithms that involve backtracking, such as depth-first search, recursion, or undo/redo operations.
A queue is a linear data structure that follows the FIFO (first-in, first-out) principle. It means that the first element added to the queue is the first one to be removed.
You can think of a queue as a line of people waiting for a service, where new people join the end of the line and the person at the front gets served. Q
queues are useful for implementing algorithms that involve processing data in the order they arrive, such as breadth-first search, buffering, or scheduling.
A stack follows the Last In, First Out (LIFO) principle, while a queue follows the First In, First Out (FIFO) principle.
Python’s list can be used to implement both:
stack = [1, 2, 3] stack.append(4) # Stack: [1, 2, 3, 4]
In Python, lists can be utilized to implement both stacks and queues, making them versatile for a variety of tasks.
2. Linked Lists
Linked lists consist of nodes, each containing data and a reference to the next node.
A linked list is a linear data structure that consists of nodes that are connected by pointers. Each node contains a data element and a pointer to the next node in the list. The first node is called the head and the last node is called the tail.
A linked list allows for efficient insertion and deletion of elements at any position, but it requires extra space for storing the pointers and it does not support random access.
They can be singly or doubly linked. Linked lists are valuable in scenarios where dynamic memory allocation is crucial. Example of a singly linked list in Python:
class Node: def __init__(self, data): self.data = data self.next = None # Usage node1 = Node(1) node2 = Node(2) node1.next = node2
Linked lists are fundamental in understanding more complex Python data structures and algorithms.
3. Trees
Trees are hierarchical Python data structures with nodes connected by edges. A node is a data element that can have one or more children nodes. An edge is a link that connects a parent node to a child node.
Binary trees, binary search trees, and AVL trees are common types.
A tree has a special node called the root, which has no parent. A node that has no children is called a leaf.
A tree can be used to represent various kinds of data, such as file systems, XML documents, arithmetic expressions, or decision processes.
Trees are often used for efficient searching and retrieval operations.
class TreeNode: def __init__(self, value): self.value = value self.left = None self.right = None # Usage root = TreeNode(1) root.left = TreeNode(2) root.right = TreeNode(3)
Trees play a crucial role in various algorithms and are a fundamental concept in computer science.
4. Heaps
Heaps are specialized tree-based Python data structures categorized as min heaps and max heaps.
A heap is a tree-based python data structures that satisfies the heap property: the value of each node is greater than or equal to the value of its children (max-heap) or less than or equal to the value of its children (min-heap).
A heap can be used to implement a priority queue, which is a data structure that allows for efficient retrieval and removal of the highest or lowest priority element.
A heap can also be used to perform heap sort, which is a sorting algorithm that sorts an array by repeatedly building a heap and extracting the root element.
They are often used for implementing priority queues, where the highest (or lowest) priority element is readily accessible.
import heapq heap = [3, 1, 4, 1, 5, 9, 2] heapq.heapify(heap) # Min Heap: [1, 1, 2, 4, 5, 9, 3]
Heaps ensure that the most important element is efficiently accessible, making them valuable in algorithms like Dijkstra’s shortest path algorithm.
5. Hash Tables
A hash table is a mapping python data structures that stores key-value pairs in an array. It uses a hash function to map a key to an index in the array, where the corresponding value is stored.
A hash table allows for fast insertion, retrieval, and deletion of elements by their keys, but it may suffer from collisions, which occur when two or more keys map to the same index.
Python’s built-in dict
is an example of a hash table. Hash tables are particularly useful when quick data lookup is crucial.
my_dict = {‘name’: ‘Alice’, ‘age’: 25, ‘city’: ‘London’}
Understanding the underlying principles of hash tables is essential for efficient data storage and retrieval in Python.
Credits:
- Photo by Christina Morillo: https://www.pexels.com/photo/person-using-macbook-pro-1181373/
References:
- Hash Table Data Structure (programiz.com)
- 5. Data Structures — Python 3.12.1 documentation
- Python Data Structures (devopedia.org)
Conclusion
Understanding Python’s diverse set of data structures is crucial for effective programming. Depending on the task at hand, choosing the right Python data structures can significantly impact the efficiency of your code.
This guide has provided a thorough exploration of Python data structures, offering a foundation for further exploration and experimentation.
So, go forth and build! Let your Python programs be well-structured houses, with Python data structures as the sturdy foundations upon which they stand.