Today I had one of those, aha! moments with Pandas which has made my life a lot simpler.
Here’s the story.
I have an excel file which contains a list of discount codes, some have been used, and some have not. I need to go through the list of codes and find out the following:
- Have we assigned a code to anyone who has now left the business?
To find out who the ‘leavers’ were, I passed a list of employees and a list of used codes to a function and then iterated over the list of codes, putting any employee who is not in the used code list into a levers list
def FindLeaversWithCodes(people, used_codes) : #Find people we've sent a code to who are no longer in the HC Report _leavers =  #Get the Employee IDs of all the people we've sent codes to. _people = used_codes['personNumber'].tolist() #loop through each employee id for person in _people: #if this employee id is in teh current HC report then save as Current Employee if person not in people['Person Number'].values: #if this employee id is NOT in the current HC report then save as Leaver _leavers.append(person) return _leavers
However, because I move things from a Dataframe to a list, I then had to filter the original Dataframe by the list of employee numbers,
_leavers in the list to generate a new XSLX file in the original format. Incredibly inefficient.
try: #Filter the used codes list to show the leavers df = used[used['personNumber'].isin(Leavers)] df.to_excel('../leavers.xlsx', index=False) except Exception as error: print(error)
That first line of code
df = used[used['personNumber'].isin(Leavers)] should have given me a clue as to what the
FindLeaversWithCodes function should actually be doing.
Rather than convert to a list and use a for…in loop, I should have simply filtered the Dataframe to get the records I need.
Now the code is much simpler
def LeaversWithCodes(people, used_codes) : #This person has been assigned a code #This person is not an employee return used_codes[~used_codes["personNumber"].isin(people["Person Number"].values.tolist())]
~ character before the
used_codes["personNumber"] is a ‘not’ symbol. I’m doing the same as before but this time filtering the original Dataframe where the personNumber is not in the People Dataframe.
Wordle: 570 4/6