2023-01-11
Today I had one of those, aha! moments with Pandas which has made my life a lot simpler.
Here’s the story.
I have an excel file which contains a list of discount codes, some have been used, and some have not. I need to go through the list of codes and find out the following:
- Have we assigned a code to anyone who has now left the business?
To find out who the ‘leavers’ were, I passed a list of employees and a list of used codes to a function and then iterated over the list of codes, putting any employee who is not in the used code list into a levers list
def FindLeaversWithCodes(people, used_codes) :
#Find people we've sent a code to who are no longer in the HC Report
_leavers = []
#Get the Employee IDs of all the people we've sent codes to.
_people = used_codes['personNumber'].tolist()
#loop through each employee id
for person in _people:
#if this employee id is in teh current HC report then save as Current Employee
if person not in people['Person Number'].values: #if this employee id is NOT in the current HC report then save as Leaver
_leavers.append(person)
return _leavers
However, because I move things from a Dataframe to a list, I then had to filter the original Dataframe by the list of employee numbers, _leavers
in the list to generate a new XSLX file in the original format. Incredibly inefficient.
try:
#Filter the used codes list to show the leavers
df = used[used['personNumber'].isin(Leavers)]
df.to_excel('../leavers.xlsx', index=False)
except Exception as error:
print(error)
That first line of code df = used[used['personNumber'].isin(Leavers)]
should have given me a clue as to what the FindLeaversWithCodes
function should actually be doing.
Rather than convert to a list and use a for…in loop, I should have simply filtered the Dataframe to get the records I need.
Now the code is much simpler
def LeaversWithCodes(people, used_codes) :
#This person has been assigned a code
#This person is not an employee
return used_codes[~used_codes["personNumber"].isin(people["Person Number"].values.tolist())]
the ~
character before the used_codes["personNumber"]
is a ‘not’ symbol. I’m doing the same as before but this time filtering the original Dataframe where the personNumber is not in the People Dataframe.
Wordle: 570 4/6