Knowledge Required: Moderate
Tools required: Python
Working with Python dictionaries doesn’t have to boring! For those who are new here, dictionary objects store their data in [key,value] pairs. Getting data from a dictionary is fairly simple and most tutorials will show you something like:
my_dict={"mykey":"myvalue"}
#to get the value of "mykey" you'd do the following
value=my_dict["mykey"]
The above only works if you know that the key named “mykey” is in your dictionary. If you tried to request a key which wasn’t present in the dictionary, you’d get something that looked like the following:
my_dict={}
print (my_dict["nokey"])
Traceback (most recent call last):
File "test.py", line 2, in <module>
KeyError: 'nokey'
Functionalising getting keys
If you are doing repetitive tasks and you need to get values from a dictionary, you may wish to write a function to do this. For robustness, we need to make sure that if we request a key that isn’t present, it doesn’t break the code. As Python has provided us with the error it returns when we request a key that isn’t present, that can be implemented in a try / except which allows us to safely respond to failure.
def by_except(data={},key=""):
try:
value=data[key]
return value
except KeyError:
return
The function by_except will only return a value if the key we requested is actually in the dictionary
Another way?
As the data we’re working with is a type dict Python supports another method of getting the value of a dictionary by using a built-in function named get. Compared to the above, the syntax is a bit different:
value=data.get(key,None)
The main advantage with using get (amongst others we’ll see later) is that if the key doesn’t exist, you can provide a default value. This can be great if you’re working with dictionaries where the dictionary may not always contain your key. In this case, if our key doesn’t exist, we set the value to a None type in Python.
We can build this into a function, like we did before:
def by_get(data={},key=""):
"""Returned by get feature"""
value=data.get(key,None)
if value:
return value
We’ve added an if statement in our function. When you evaluate a variable in Python, if it is empty or None it evaluates as false. Therefore, if the key doesn’t exist, it will be None resulting in our if statement being false. However, if we have data the value will not be None and the value gets returned from the function.
Tip: Passing a variable to an if statement is a quick and easy way to see if it has data without actually having to know the exact data you’re looking for. It works for most data types in Python! You can use this to make your code more efficient by only processing variables when there is actually data in them
So which is better?
I’m going to determine which is better by simply looking at how fast each function works. This will be done by iterating through a list of random data and seeing if the key exists in our dictionary. This can be done by generating a list of random strings, each 10 characters long.
Fill up a list with random values:
We’ll create 200000 random entries
keylookup=[]
random_objects=200000
for i in range(random_objects):
letters = string.ascii_lowercase
result_str = ''.join(random.choice(letters) for i in range(10))
keylookup.append(result_str)
Then define our dictionary:
my_dict={"realkey":"realdata"}
Lastly, to make sure we actually have a key value in the list we’re going to iterate through, we’ll add it in at the end. That means our function will exhaust every other random value before trying one we actually know is in the dictionary.
keylookup.append("realkey")
This means if we go through all the items in our random list on attempt number 200,001, we’ll actually request an existing key from the dictionary. If we want to change how many random entries we’re trying first, we can just change the value of random_objects.
Now we’ll import perf_counter_ns from the built in time library that will allow us to measure performance of our functions, the final code ends up looking like this:
from time import perf_counter_ns
import random
import string
keylookup=[]
random_objects=200000
for i in range(random_objects):
letters = string.ascii_lowercase
result_str = ''.join(random.choice(letters) for i in range(10))
keylookup.append(result_str)
#actually make sure the real key is in the list
my_dict={"realkey":"realdata"}
keylookup.append("realkey")
def by_get(data={},key=""):
"""Retured by get feature"""
value=data.get(key,None)
if value:
return value
def by_except(data={},key=""):
try:
value=data[key]
return value
except KeyError:
return
by_get_start=perf_counter_ns()
#by get function
for item in keylookup:
value_from_dict=by_get(data=my_dict,key=item)
if value_from_dict:
by_get_stop=perf_counter_ns()
print(f"by_get took: {by_get_stop-by_get_start} nanoseconds")
by_except_start=perf_counter_ns()
#by exception
for item in keylookup:
value_from_dict=by_except(data=my_dict,key=item)
if value_from_dict:
by_except_stop=perf_counter_ns()
print(f"by_except took: {by_except_stop-by_except_start}")
Don’t worry if you don’t fully understand it. This is just for testing purposes, so let’s press onto the results.
The results:
The test uses 200,000 random entries to test against the two functions to make our timing numbers large enough to be something measurable. Testing against a high volume of random entries isn’t a real world application but will tell us if one function is faster than the other.
Timing one function against another we get the following in nanoseconds:
| Run | Function using ‘get’ | Function using excepts |
|---|---|---|
| 1 | 25929542 | 36222125 |
| 2 | 25160541 | 35281334 |
| 3 | 26401209 | 36203292 |
| 4 | 25824000 | 36560167 |
| 5 | 25397875 | 35083208 |
| 6 | 25021042 | 35349417 |
| 7 | 25306333 | 35422667 |
| 8 | 25154083 | 35038375 |
| Avg: | 25524328 | 35645073 |
Just to prove the initial result wasn’t an outlier, we’ll generate averages. Using Python’s built in get function is considerably faster than a try and except.
Result analysis:
Compared to using try / except, we can see that using Python’s built in get method is 28% faster. This follows the trend that using Python’s built in functions usually yield faster results as they are better optimized. In your code, use Python’s get feature when you need to access dictionary keys.
This article was inspired by an interesting YouTube video which can be found here The Fastest Way to Loop in Python - An Unfortunate Truth
EOF