Simplifying Long Nested If-Else Ladders
A method to write long if-else ladders in a more structured and pythonic manner...
In many situations, we encounter cases where based on a particular event or option, we have to provide the user with a different result. Such situations are programmed in the form of conditional statements. Python offers if-else construct just like many other languages to fill up this functionality. However, in cases where there are a lot of choices involved, using long if-else ladder results in difficulty in code readability and maintenance. In these particular cases, one can utilize functional programming, a dictionary, and numerical expressions to efficiently obtain the same result. In this article, I will illustrate the use of this approach to tackle a basic data transformation process.
Use Case - Loading and Processing Data Files
Let us consider a basic use case for our implementation. Assume that we are provided data files for student marks from different sections of a class. Each file is fixed with a suffix that represents the section that the file belongs to. Moreover, the file formats vary for some sections e.g. one section's class advisor has stored the data in .csv format, another has stored it in .txt format, and another in .xlsx format, etc. For our implementation, we are going to restrict our data to 4 sections, namely A, B, C, and D, with the following file names:
Section A: student_marksA.csv
Section B: student_marksB.txt
Section C: student_marksC.csv
Section D: student_marksD.xlsx
Our task is to take the data stored in each file and then generate a resultant file containing each student's name and their obtained grade. For simplicity, assume that the files have a name and marks attribute only. Now that our use case has been defined, let us dive into the implementation.
Implementation using If-Else Ladder
The first approach that comes to mind is that we accept the file input for the section and check its extension. Based on the type of extension, we perform the relevant method to process its data and store the results as per the defined file format. Now, let us look at its implementation using the if-else ladder:
#Importing necessary modules
import pandas as pd
#Utility methods for processing files
def process_text(filename):
data = pd.read_csv(filename, sep=" ")
return data
def process_csv(filename):
data = pd.read_csv(filename)
return data
def process_xlsx(filename):
data = pd.read_excel(filename)
return data
#Utility method to assign grade based on obtained marks
def assign_grade(marks):
if marks >= 90:
return 'A+'
elif marks >= 80:
return 'A'
elif marks >= 70:
return 'B'
elif marks >= 60:
return 'C'
elif marks >= 50:
return 'D'
else:
return 'F'
#Main method to process marks and get results
def main():
#Taking input files from user
input_files = input("Enter list of files you want to process: ")
#Generating a resultant dataframe to store extracted data
out_df = pd.DataFrame(columns=['name', 'marks'])
#Iterating over each file and adding its data to resultant dataframe
for file in input_files.split():
file_ext = file.split('.')[-1]
if file_ext == "txt":
data = process_text(file)
elif file_ext == "csv":
data = process_csv(file)
elif file_ext == "xlsx":
data = process_xlsx(file)
out_df = out_df.concat([out_df, data])
#Apply utility method on resultant dataframe to get corresponding grades
out_df["grade"] = out_df["marks"].apply(assign_grade)
#Sort the dataframe rows on the basis of descending grade order
grades_order = ['A+','A','B','C','D','F']
out_df["grade"] = pd.CategoricalIndex(out_df["grade"], ordered=True, categories=grades_order)
out_df = out_df.sort_values(by=['grade','name'])
#Store the obtained result in result.csv file for future use
out_df.to_csv("result.csv")
#Invoke the main method for execution of the script
if __name__ == "__main__":
main()
The above code works perfectly to perform the desired task. The obtained result, stored in the result.csv file can be used for visualization tasks and/or comparing the results with other classes or previous years, etc. (Note: I didn't mention the format of how the data is actually stored in each of the files but assume that the reader can probably guess that it is the default and simple format with two attributes name and marks i.e. no preprocessing is required before adding the data to the resultant data frame, something not likely to happen in a real case scenario). Now that the traditional approach is complete, let's move on to the use of expressions.
Implementation using Numerical Conditional Expressions:
Now before we dive into the approach, let's first outline how the approach works. First off, we generate a dictionary, containing a set of keys corresponding to the numerical value to which each condition will be mapped. For example, if we are given the filename and we have three possibilities i.e. it can be a .txt format, .csv format, or .xlsx format, we can map each of these choices to a number as shown in the code block:
#Note: We will replace the value for each key in a later step
#This is just to illustrate the assigned key for each file extension
ext_mapper = {
0: '.csv', #Usually the default option is assigned 0
1: '.txt',
2: '.xlsx'
}
Now that we have a mapper in mind, we can write a numerical conditional expression. This expression is generated by multiplying each mapped number by its corresponding conditional (e.g. in the case of .csv format, we will multiply its mapped number 0 with the condition (ext == '.csv') to get its term) and then sums up the obtained values for each condition. This is illustrated in the code block below:
#Numerical expression for determining the file extension
ext = filename.split('.')[-1]
ext_expression = (
(0 * (ext == '.csv')) + #Format: key_value * logical_condition
(1 * (ext == '.txt')) +
(2 * (ext == '.xlsx'))
)
Now, after defining the numerical expression, we can go ahead and update the value for each key in the ext_mapper dictionary and replace it with the function to call for each extension as shown in the following code block:
#Extension mapper dictionary
ext_mapper = {
0: (lambda x: pd.read_csv(x)),
1: (lambda x: pd.read_csv(x, sep=" ")),
2: (lambda x: pd.read_excel(x))
}
#Here we have used lambda functions to reduce space of defining utility functions again
Now, that our numerical expression, as well as mapper dictionary, are ready, we can go ahead and implement the code for execution of data retrieval based on the file extension:
#Code block to add data into dataframe after reading each file
out_df = pd.DataFrame(columns=["name", "mark"])
for file in input_files:
ext = file.split(".")[-1]
#Compute value for the expression
ext_expression = (0 * (ext == '.csv')) + (1 * (ext == '.txt')) +
(2 * (ext == '.xlsx'))
#Call the corresponding lambda method based on extension and call it on the given file
data = ext_mapper[ext_expression](file)
#Concatenate the obtained data into resultant dataframe
out_df = pd.concat([out_df, data])
The advantage of this approach is that we can add as many conditional entries as we want without mixing everything up. This allows easy maintenance in case of complex conditional executions. Now the same approach can be used for the grade assignment method. The final code for this approach looks as follows:
#Importing necessary modules
import pandas as pd
#Utility method for grade assignment
def assign_grade(marks):
#Mapper for Grade Assignment
grade_mapper = {0: 'F', 1: 'D', 2:'C', 3:'B', 4:'A', 5:'A+'}
#Grade expression
grade_exp = (0 * (marks < 50) +
1 * ((marks >= 50) && (marks < 60)) +
2 * ((marks >= 60) && (marks < 70)) +
3 * ((marks >= 70) && (marks < 80)) +
4 * ((marks >= 80) && (marks < 90)) +
5 * ((marks >= 90) && (marks <= 100)))
#Return grade based on expression value
return grade_mapper[grade_exp]
#Main method to process marks and get results
def main():
#Taking input files from user
input_files = input("Enter list of files you want to process: ")
#Generating a resultant dataframe to store extracted data
out_df = pd.DataFrame(columns=['name', 'marks'])
#Mapper for File Extensions
ext_mapper = {
0: (lambda x: pd.read_csv(x)),
1: (lambda x: pd.read_csv(x, sep=" ")),
2: (lambda x: pd.read_excel(x))
}
#Iterating over each file and adding its data to resultant dataframe
for file in input_files.split():
file_ext = file.split('.')[-1]
ext_expression = ((0 * (ext == '.csv')) + (1 * (ext == '.txt')) +
(2 * (ext == '.xlsx')))
data = ext_mapper[ext_expression](file)
out_df = out_df.concat([out_df, data])
#Applying utility method to obtain corresponding grade
out_df["grade"] = out_df["marks"].apply(assign_grade)
#Sort the dataframe rows on the basis of descending grade order
grades_order = ['A+','A','B','C','D','F']
out_df["grade"] = pd.CategoricalIndex(out_df["grade"], ordered=True, categories=grades_order)
out_df = out_df.sort_values(by=['grade','name'])
#Store the obtained result in result.csv file for future use
out_df.to_csv("result.csv")
#Invoke the main method for execution of the script
if __name__ == "__main__":
main()
And we are done. The approach may seem a bit odd at first glance, but it provides better code readability and maintenance as the conditional expression is sectioned into its conditions and function calls separately. A simple example of such maintenance is shown in the following code block, where we add another file extension i.e. .xls for processing. As we know that .xls is also an extension for Excel files, we will call the same method on it as shown:
def main():
#---Code snippet here---
#Mapper for File Extensions
ext_mapper = {
0: (lambda x: pd.read_csv(x)),
1: (lambda x: pd.read_csv(x, sep=" ")),
2: (lambda x: pd.read_excel(x))
}
#---Code snippet here---
for file in input_files.split():
file_ext = file.split('.')[-1]
ext_expression = ((0 * (ext == '.csv')) + (1 * (ext == '.txt')) +
(2 * ((ext == '.xlsx') || (ext === '.xls'))))
data = ext_mapper[ext_expression](file)
out_df = out_df.concat([out_df, data])
#---Code snippet here---
Hence, just like that, we can add more functionality by adding to the numerical expression as well as the mapper as per the increasing number of alternatives.
So... what about its cons?
Well, this approach is obviously not free of cons. There are two main things to consider before applying this operation.
Make sure the conditional expression is not too simple. Simpler conditionals e.g. checking if the number is even or odd, etc. are not worth creating such a fuss over.
Make sure the conditions that you are using are independent / mutually exclusive in nature e.g. in the case of determining if a year is leap or not, we make use of the condition to check divisibility by 4 as well as 400, both of which share some common set of elements. In such cases, the expression may result in a wrong answer and will result in more complications to solve.
Other than these two, if you have a set of many conditions to check with many methods to execute for each of these conditions, using the expression method is very useful as it allows you to define your conditions and code in separate spaces in a well-defined and efficient manner.
Conclusion:
So today, we covered the topic of writing conditional expressions, an alternative for long if-else ladders. This approach makes use of dictionary data structure as well as the object properties of python to give the same result but in a more well-structured format. Although its use is to be avoided for simpler cases as well as non-exclusive conditions, it can be quite helpful for the rest of the cases.
That's it for today! Hope you enjoyed the article and got to learn something from it. Don't forget to comment with your feedback on the approach. Do you think this approach can be useful in any of your tasks? If so, do share them in the comments.
Thanks for reading! Hope you have a great day! ๐๐