Tuesday, February 3, 2015

Remap Values in Pandas DataFrame Column with a Dictionary & Transform Pandas GroupBy Object to Pandas DataFrame

A few days ago I advanced to Semi-final stage of The Data Incubator fellowship application. The next step for me was to  solve some Challenge Problems and submit the solutions (If my solutions are correct next stage will be interview process, so keeping my fingers crossed). While solving the challenge problems (Of course, I won't be discussing challenge problems or solutions), I learned some Pandas Data Wrangling tricks. Here are two tricks to "Remap values in Pandas DataFrame column with a Dictionary" and "Transform Pandas GroupBy Object to Pandas DataFrame". I am using an example data set from Kaggle's competition to "Predict if a car purchased in an auction is a Lemon". The data is available here.

####################################################################

# Upgrading a Python Module (Mac Terminal)
python -m pip install --upgrade seaborn

# special IPython command to prepare the notebook for matplotlib
%matplotlib inline 

# Import necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load data
cars = pd.read_csv('training.csv')
cars.head( )
Out: DataFrame column "VNST" has U.S. State Abbreviations, e.g. CA for California

# Mapping between State abbreviations and State Name
state_abv = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AS': 'American Samoa',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'GU': 'Guam',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
        'IN': 'Indiana',
        'KS': 'Kansas',
        'KY': 'Kentucky',
        'LA': 'Louisiana',
        'MA': 'Massachusetts',
        'MD': 'Maryland',
        'ME': 'Maine',
        'MI': 'Michigan',
        'MN': 'Minnesota',
        'MO': 'Missouri',
        'MP': 'Northern Mariana Islands',
        'MS': 'Mississippi',
        'MT': 'Montana',
        'NA': 'National',
        'NC': 'North Carolina',
        'ND': 'North Dakota',
        'NE': 'Nebraska',
        'NH': 'New Hampshire',
        'NJ': 'New Jersey',
        'NM': 'New Mexico',
        'NV': 'Nevada',
        'NY': 'New York',
        'OH': 'Ohio',
        'OK': 'Oklahoma',
        'OR': 'Oregon',
        'PA': 'Pennsylvania',
        'PR': 'Puerto Rico',
        'RI': 'Rhode Island',
        'SC': 'South Carolina',
        'SD': 'South Dakota',
        'TN': 'Tennessee',
        'TX': 'Texas',
        'UT': 'Utah',
        'VA': 'Virginia',
        'VI': 'Virgin Islands',
        'VT': 'Vermont',
        'WA': 'Washington',
        'WI': 'Wisconsin',
        'WV': 'West Virginia',
        'WY': 'Wyoming'


# Get unique 'State Abbreviations'
states = cars['VNST'].unique( )

# Calculate total number of lemon titled cars by state and store in a DataFrame
temp_1 = [ ]
temp_2 = [ ]
for i, state in enumerate(states):
    df = cars[cars['VNST'] == states[i]]
    temp_1.append(sum(df['IsBadBuy']))
    temp_2.append(state)

       
dfNew = pd.DataFrame(temp_1)
dfNew.columns = ['Lemon']
dfNew['StAbv'] = temp_2
dfNew = dfNew.sort(['Lemon'])
dfNew.head( )
Out:


Remap values in Pandas DataFrame column with a Dictionary
dfNew['State'] = dfNew['StAbv'].map(state_abv.get)
dfNew.head( )

Out: Remapped values from dictionary's values




## Transform Pandas GroupBy Object to Pandas DataFrame
# Group by 'State' and 'Vehicle Manufacturer' then to Pandas DataFrame
carsNew = cars.groupby(by = ['VNST', 'Make']).agg({'IsBadBuy':'sum'})

# Group by 'Vehicle Manufacturer' then to Pandas DataFrame
stateNew = cars.groupby(['VNST']).agg({'IsBadBuy':'sum'})

# Calculate % and then transform by resetting index
carsNew = (carsNew.div(stateNew, level = 'VNST') * 100).reset_index( )

Remap values in Pandas DataFrame column with a Dictionary
carsNew['State'] = carsNew['VNST'].map(state_abv.get)
carsNew.head()

Out:


## Heatmap using Seaborn library
Create Matrix form of DataFrame using pivot
carsMatrix = carsNew.pivot('State', 'Make', 'IsBadBuy')
carsMatrix.head( )

Out:

# Fill all 'NaN' with 0
carsMatrix.fillna(0, axis = 0, inplace = True)
carsMatrix.head( )

Out: 


# Plotting Heatmap
sns.set()
plt.figure(figsize=(8, 8))
sns.heatmap(carsMatrix)

Out:


1 comment:


  1. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. IEEE final year projects on machine learning In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

    ReplyDelete