How to Count unique values in a column in the pandas dataframe?

In the present post, we’ll figure out how to count and include distinct occurrences in columns of a Pandas DataFrame.

When dealing with machine learning and artificial intelligence or data analysis with pandas we are frequently expected to get the count of unique or distinct values from a solitary column or numerous columns.

You can get the number of unique values in the column of pandas DataFrame utilizing multiple ways like utilizing capacities Series.unique.size, Series.nunique(), Series.drop_duplicates().size(). Since the DataFrame column is inside addressed as a Series, you can utilize these functions to play out the activity.

Mostly, the information in every column addresses an alternate component of the data frame. It could be persistent, categorical, or something shocking like different texts. In the event that you don’t know about the idea of the qualities you’re managing, it very well may be a decent exploratory advance to be familiar with the count of particular qualities. In this instructional guide, we’ll take a gander at how to get the inclusion of novel qualities in every column of a pandas data frame.

What is Python Pandas?

Pandas is an open-source Data Analysis library that is made basically for working with social or marked information both effectively and instinctively. It gives different information designs and tasks for controlling/ manipulating numerical data and time series. This library is based on top of the NumPy library. Pandas is quick and it has elite execution and efficiency for clients.

count unique values of a column in pandas DataFrame

  • Speedy Examples of Count Unique Values in Column
  • By using Series.unique() – Count Unique Values
  • By using Series.nunique()
  • Count Unique Values in Multiple Columns

1] Speedy Examples of Count Unique Values in Column.

Following are fast instances of how to count unique values in columns.

Below are the fastest examples:

# Get Unique Count using Series.unique()
count = df.Courses.unique().size

# Using Series.nunique()
count = df.Courses.nunique()

# Get frequency of each value
frequence = df.Courses.value_counts()

# By using drop_duplicates()
count = df.Courses.drop_duplicates().size

#Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]

#Count unique on multiple columns
count = df[['Courses','Fee']].nunique()

#count unique values in each row

We should make a DataFrame.

import pandas as pd
import numpy as np
technologies = {
    'Fee' :[20000,25000,22000,30000,25000,20000,30000],
df = pd.DataFrame(technologies)

Yields beneath yield.

   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      1200
3   Pandas  30000   50days      2000
4   Python  25000   40days      2300
5    Spark  20000   30days      1000
6   Pandas  30000   50days      2000

2] By using Series.unique() – Count Unique Values.

To get a count of unique values in a column use pandas, first, use Series.unique() function to get one-of-a-kind/ unique qualities from a column by eliminating duplicate values and afterward call the size to get the count. the unique() function returns an array with extraordinary worth arranged by appearance and the outcomes are not arranged/sorted.

Syntax: Series.unique()

For example:

# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Unique values count : "+ str(count))

# Output
# Unique values count : 4

3] By using Series.nunique().

On the other hand, you can likewise have a go at utilizing Series.nunique(), this profits a number of remarkable components in the item barring NaN values. To incorporate NaN values use to drop a param to False.

Syntax: Series.nunique(dropna=True)


# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Outputs
# Unique values count : 4

4] Count Unique Values in Multiple Columns.

To get the count of one of the unique values on multiple columns use pandas DataFrame.drop_duplicates() which drops copy lines/rows from pandas DataFrame. This wipes out copies and returns DataFrame with unique rows.

On the outcome use the shape property to get the state of the DataFrame which preferably returns a tuple with rows and columns, and use shape[0] to get the row count.

What is a DataFrame?

A DataFrame is a data structure that coordinates information/data into a 2-dimensional table of rows and columns, similar to a spreadsheet. DataFrames are perhaps the most well-known datum structures utilized in present-day data analytics since they are an adaptable and natural approach to putting away and working with data.

# Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

# Outputs
# Unique multiple columns : 5

Hope this post will beneficial for you, thank you have a good day.


Leave a Reply

Your email address will not be published. Required fields are marked *