5 Little Known Pandas Features 🤓
Introduction
In this blog, I will dive into five lesser-known features of the Pandas library that can significantly enhance your data analysis workflows. These features are designed to make data handling easier and more efficient.
Getting the data
To start off, I will import the Pandas library and load a demo dataset. This dataset includes information about total bills, tips, customer details, and the day and time of their visits. We will utilize this dataset throughout the discussion.
Describe also non-numeric values
One of the first things I do when examining a new dataset is to use the describe
method to get a quick overview. Typically, this method provides statistics for numeric values. However, did you know you can include non-numeric values as well? By using include='all'
, Pandas will include non-numerical values in its summary, giving you insights into the number of unique values and the most frequent ones. For example, in our dataset, out of 244 customers, 151 were non-smoking.
Count the number of distinct elements
In addition to the describe
method, I often use nunique()
to return the number of distinct elements in each column. This method provides a quick summary of unique values across the dataset, which can be extremely useful for data analysis.
Use plotly for pandas charts
Another feature that might not be obvious is the ability to change the default plotting engine. While Pandas can generate plots using Matplotlib, I prefer using Plotly for its interactive capabilities. By simply adding a line of code, I can switch the plotting engine to Plotly. This allows me to create interactive charts where I can hover over data points to see exact values, making the analysis much more intuitive and informative.
Pandas and Numpy where-function
Next, let’s explore the where
function in Pandas. This function allows you to replace values based on a condition. For instance, if I want to replace tip values below 2 with NaN, I can easily do this using where
. Furthermore, I can calculate the average tip size and replace values that are below average with a custom string like ‘below average’. Instead of overwriting the original DataFrame, I can create a copy to maintain my initial data.
Style your dataframe
Styling your DataFrame can be crucial when presenting your findings. You can create a function to apply quick styling to your DataFrame, such as setting captions, formatting numbers, or applying heat maps to specific columns. This allows for a visually appealing representation of the data without altering the actual DataFrame. However, it’s important to note that these styles will not carry over when exporting to Excel.
Style your Excel export
If you want to ensure that your Excel exports maintain the styling, you can utilize the StyleFrame package. This package allows for more advanced styling options, such as changing header styles, adding filters, and freezing panes. For instance, I can adjust the column width and font family for better presentation. In the example, I demonstrate how to apply these styles and what the exported Excel file will look like.
Outro
These five features can greatly enhance your experience with Pandas and make your data analysis tasks more efficient and visually appealing. If you have any questions or want to learn more about specific features, feel free to leave a comment!