57: Pandas – Filtering Records


Today I had one of those, aha! moments with Pandas which has made my life a lot simpler.

Here’s the story.

I have an excel file which contains a list of discount codes, some have been used, and some have not. I need to go through the list of codes and find out the following:

  • Have we assigned a code to anyone who has now left the business?

To find out who the ‘leavers’ were, I passed a list of employees and a list of used codes to a function and then iterated over the list of codes, putting any employee who is not in the used code list into a levers list

def FindLeaversWithCodes(people, used_codes) :
    #Find people we've sent a code to who are no longer in the HC Report
    _leavers = []

    #Get the Employee IDs of all the people we've sent codes to.
    _people = used_codes['personNumber'].tolist()
    #loop through each employee id
    for person in _people:
        #if this employee id is in teh current HC report then save as Current Employee
        if person not in people['Person Number'].values: #if this employee id is NOT in the current HC report then save as Leaver
    return _leavers

However, because I move things from a Dataframe to a list, I then had to filter the original Dataframe by the list of employee numbers, _leavers in the list to generate a new XSLX file in the original format. Incredibly inefficient.

    #Filter the used codes list to show the leavers
    df = used[used['personNumber'].isin(Leavers)]
    df.to_excel('../leavers.xlsx', index=False)
except Exception as error:

That first line of code df = used[used['personNumber'].isin(Leavers)] should have given me a clue as to what the FindLeaversWithCodes function should actually be doing.

Rather than convert to a list and use a for…in loop, I should have simply filtered the Dataframe to get the records I need.

Now the code is much simpler

def LeaversWithCodes(people, used_codes) :
    #This person has been assigned a code
    #This person is not an employee
    return used_codes[~used_codes["personNumber"].isin(people["Person Number"].values.tolist())]

the ~ character before the used_codes["personNumber"] is a ‘not’ symbol. I’m doing the same as before but this time filtering the original Dataframe where the personNumber is not in the People Dataframe.

Wordle: 570 4/6

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s