Skip to product menu
close
  • Recent Launches
    Press Space or Enter to display list of options
EXPLORE ALL PRODUCTS

Recent Launches

New

Payroll software with automated tax payments and filing.

Try now
New

Robotic process automation software to automate high-volume, rule-based tasks.

Try for free
New

Low-code IoT platform and solutions for connected businesses.

Try now
New

Business formation service to launch and grow your businesses.

Try now
New

Privacy-friendly application analytics solution.

Try for free

Sales

 
CRM

Comprehensive CRM platform for customer-facing teams.

CRM
 
Bigin

Simple CRM for small businesses moving from spreadsheets.

Bigin
 
Forms

Build online forms for every business need.

Forms
 
SalesIQ

Live chat app to engage and convert website visitors.

SalesIQ
 
Bookings

Appointment scheduling app for consultations with customers.

Bookings
 
Sign

Digital signature app for businesses.

Sign
 
RouteIQ

Comprehensive sales map visualization and optimal route planning solution.

RouteIQ
 
Thrive

Complete loyalty and affiliate management platform.

Thrive
 
Voice

Cloud Contact Center Software for businesses.

Voice
 
Suites
CRM Plus

Unified platform to deliver top-notch customer experience.

CRM Plus

Marketing

 
Social

All-in-one social media management software.

Social
 
Campaigns

Create, send, and track targeted email campaigns that drive sales.

Campaigns
 
Forms

Build online forms for every business need.

Forms
 
Survey

Design surveys to reach and interact with your audience.

Survey
 
Sites

Online website builder with extensive customisation options.

Sites
 
PageSense

Website conversion optimization and personalisation platform.

PageSense
 
Backstage

End-to-end event management software.

Backstage
 
Webinar

Webinar platform for webcasting online webinars.

Webinar
 
Marketing Automation

All-in-one marketing automation software.

Marketing Automation
 
LandingPage

Smart landing page builder to increase conversion rates

LandingPage
 
Publish

Manage all your local business listings on a single platform.

Publish
 
SalesIQ

Live chat app to engage and convert website visitors.

SalesIQ
 
Sign

Digital signature app for businesses.

Sign
 
Thrive

Complete loyalty and affiliate management platform.

Thrive
 
Voice

Cloud Contact Center Software for businesses.

Voice
 
NEW
LeadChain

Sync, manage, and convert leads across channels seamlessly.

LeadChain
 
NEW
CommunitySpaces

Online community platform for individuals and businesses to grow their network and brand.

CommunitySpaces
 
Suites
Marketing Plus

Unified marketing platform for marketing teams.

Marketing Plus

Commerce and POS

 
Commerce

eCommerce platform to manage and market your online store.

Commerce

Service

 
Desk

Helpdesk software to deliver great customer support.

Desk
 
Assist

Remote support and unattended remote access software.

Assist
 
Lens

Interactive remote assistance software with augmented reality.

Lens
 
FSM

End-to-end field service management platform for service businesses.

FSM
 
SalesIQ

Live chat app to engage and convert website visitors.

SalesIQ
 
Voice

Cloud Contact Center Software for businesses.

Voice
 
NEW
Solo

The all-in-one toolkit for solopreneurs.

Solo
 
Bookings

Appointment scheduling app for consultations with customers.

Bookings
 
Suites
Service Plus

Unified platform for customer service and support teams.

Service Plus

Finance

 
Books

Powerful accounting platform for growing businesses.

Books
 
FREE
Invoice

100% Free invoicing solution.

Invoice
 
Expense

Effortless expense reporting platform.

Expense
 
Inventory

Powerful stock management and inventory control software.

Inventory
 
Billing

End-to-end billing solution for your business.

Billing
 
Checkout

Collect payments online with custom branded pages.

Checkout
 
NEW
Payroll

Payroll software with automated tax payments and filing.

Payroll
 
NEW
Solo

The all-in-one toolkit for solopreneurs.

Solo
 
Practice

Practice management software for accounting firms.

Practice
 
Sign

Digital signature app for businesses.

Sign
 
Commerce

eCommerce platform to manage and market your online store.

Commerce
 
Suites
Finance Plus

All-in-one suite to manage your operations and finances.

Finance Plus

Email and Collaboration

 
Mail

Secure email service for teams of all sizes.

Mail
 
Meeting

Online meeting software for all your video conferencing & webinar needs.

Meeting
 
Writer

Word processor for focused writing and discussions.

Writer
 
Sheet

Spreadsheet software for collaborative teams.

Sheet
 
Show

Create, edit, and share slides with a sleek presentation app.

Show
 
Notebook

Beautiful home for all your notes.

Notebook
 
Cliq

Stay in touch with teams no matter where you are.

Cliq
 
Connect

Employee experience platform to communicate, engage, and build positive employee relations.

Connect
 
Bookings

Appointment scheduling app for consultations with customers.

Bookings
 
TeamInbox

Shared inboxes for teams.

TeamInbox
 
WorkDrive

Online file management for teams.

WorkDrive
 
Sign

Digital signature app for businesses.

Sign
 
Office Suite

Powerful collaborative work platform for teams.

Office Suite
 
Office Integrator

Built in document editors for web apps.

Office Integrator
 
ZeptoMail

Secure and reliable transactional email sending service.

ZeptoMail
 
Calendar

Online business calendar to manage events and schedule appointments.

Calendar
 
Learn

Knowledge and learning management platform.

Learn
 
Voice

Cloud Contact Center Software for businesses.

Voice
 
ToDo

Collaborative task management for individuals and teams.

ToDo
 
Tables

Work management tool to connect people, processes, and information.

Tables
 
FREE
PDF Editor

Collaborative online PDF editing tool.

PDF Editor
 
Suites
Workplace

Application suite built to improve team productivity and collaboration.

Workplace

Human Resources

 
People

Organize, automate, and simplify your HR processes.

People
 
Recruit

Intuitive recruiting platform built to provide hiring solutions.

Recruit
 
Expense

Effortless expense reporting platform.

Expense
 
Workerly

Manage temporary staffing with an employee scheduling solution.

Workerly
 
NEW
Payroll

Payroll software with automated tax payments and filing.

Payroll
 
Shifts

Employee scheduling and time tracking app.

Shifts
 
Sign

Digital signature app for businesses.

Sign
 
Suites
People Plus

Comprehensive HR platform for seamless employee experiences.

People Plus

Security and IT Management

 
Creator

Build custom apps to simplify business processes.

Creator
 
Directory

Workforce identity and access management solution for cloud businesses.

Directory
 
FREE
OneAuth

Secure multi-factor authenticator (MFA) for all your online accounts.

OneAuth
 
Vault

Online password manager for teams.

Vault
 
Catalyst

Pro-code platform to build and deploy your apps.

Catalyst
 
Toolkit

Complete resource for any admin-related lookup queries.

Toolkit
 
Lens

Interactive remote assistance software with augmented reality.

Lens
 
Assist

Remote support and unattended remote access software.

Assist
 
QEngine

Test automation software to build, manage, execute, and report testcases.

QEngine
 
NEW
RPA

Automate manual, tedious, and repetitive tasks easily.

RPA

BI and Analytics

 
Analytics

Modern self-service BI and analytics platform.

Analytics
 
Embedded BI

Embedded analytics and white label BI solutions, tailored for your needs.

Embedded BI
 
DataPrep

AI-powered data preparation service for your data-driven organization.

DataPrep
 
NEW
IoT

Harnessing IoT analytics for real-time operational intelligence.

IoT

Project Management

 
Projects

Manage, track, and collaborate on projects with teams.

Projects
 
Sprints

Planning and tracking tool for scrum teams.

Sprints
 
BugTracker

Automatic bug tracking software for managing bugs.

BugTracker
 
NEW
Solo

The all-in-one toolkit for solopreneurs.

Solo

Developer Platforms

 
Creator

Build custom apps to simplify business processes.

Creator
 
Flow

Automate business workflows by creating smart integrations.

Flow
 
Catalyst

Pro-code platform to build and deploy your apps.

Catalyst
 
Office Integrator

Built in document editors for web apps.

Office Integrator
 
ZeptoMail

Secure and reliable transactional email sending service.

ZeptoMail
 
QEngine

Test automation software to build, manage, execute, and report testcases.

QEngine
 
Tables

Work management tool to connect people, processes, and information.

Tables
 
NEW
RPA

Automate manual, tedious, and repetitive tasks easily.

RPA
 
NEW
Apptics

Application analytics for all apps.

Apptics
 
Embedded BI

Embedded analytics and white label BI solutions, tailored for your needs.

Embedded BI
 
NEW
IoT

Build, deploy, and scale IoT solutions for connected businesses.

IoT
 
DataPrep

AI-powered data preparation service for your data-driven organization.

DataPrep

IoT

 
NEW
IoT

Low-code IoT platform and solutions for connected businesses.

IoT

Search Result

 
CRM Plus

Unified platform to deliver top-notch customer experience.

Try now
CRM Plus
 
Service Plus

Unified platform for customer service and support teams.

Try now
Service Plus
 
Finance Plus

All-in-one suite to manage your operations and finances.

Try now
Finance Plus
 
People Plus

Comprehensive HR platform for seamless employee experiences.

Try now
People Plus
 
Workplace

Application suite built to improve team productivity and collaboration.

Try now
Workplace
 
Marketing Plus

Unified marketing platform for marketing teams.

Try now
Marketing Plus
 
All-in-one suite

Zoho One

The Operating System for Business

Run your entire business on Zoho with our unified cloud software, designed to help you break down silos between departments and increase organizational efficiency.

TRY ZOHO ONE
Zoho One
Zoho Marketplace

With over 2000 ready-to-use extensions across 40+ categories, connect your favorite business tools with the Zoho products you already use.

EXPLORE MARKETPLACE
Marketplace

Zoho Analytics has been recognized in the 2024 Gartner® Magic Quadrant™ for ABI PlatformsRead more

Skip to main content

Cluster Analysis  

Grouping data points helps understand relationships and makes data interpretation easier. Cluster Analysis is a method used in data analysis to group similar data points based on certain factors or similarities across multiple measures and dimensions. Each cluster contains data points that are more similar to each other than to those in other clusters. 

Zoho Analytics leverages advanced machine learning models like K-means, K-modes, and K-prototype for cluster analysis.

On this Page

Business Use Cases

Market Segmentation: Cluster analysis helps businesses understand their customer base by segmenting it based on different characteristics, such as purchasing behavior and demographics, allowing them to create targeted marketing strategies and personalize recommendations.

Social Network Analysis: By analyzing web logs, cluster analysis can reveal common behavior patterns among users, such as navigation paths, popular content sequences, or exit pages. These insights help optimize site structure and improve user experience.

Credit Risk Analysis: Cluster analysis can help financial institutions like banks group their customers based on their credit scores, income levels, and age to assess the risk levels and determine the appropriate loan limits.

Points to Note 

  • A minimum of five data points are required for cluster analysis.
  • Cluster Analysis is supported for scatter plots, bar charts (horizontal and vertical), and bubble charts.
  • Cluster Analysis is not supported with Forecast, Trend line, and Anomaly analysis.

How to Perform Cluster Analysis in Zoho Analytics?

  1. Drag and drop the required columns into the report builder.
  2. Click the Analysis icon in the tool bar and select Cluster Analysis from the drop-down menu. 


    (or) 
    Click the Settings icon on the top right and access the Analysis tab.  

     
  3. Click Add Clusters.
  4. The Model is auto selected based on the columns used in the report. (You can also choose the model that should be used for clustering.)
  5. Factors are the columns or conditions based on which the data points are grouped. At least one factor is required for cluster analysis, and up to 20 factors can be used.
  6. Factors are listed based on the model selected for analysis. Refer to the Clustering Models section to learn more
    • All the Measure columns and the Dimension columns with the Count function will be listed for the K-means model, as the data points are grouped based on the Euclidean distance.
    • All the Dimension columns are listed for the K-modes model, as the data points are grouped based on the binary distance.
    • Both Measure and Dimension columns are listed for the K-prototype model as the data points are grouped based on both Euclidean distance and binary distance.
  7. Choose the Number of Clusters into which the data points should be grouped. The Number of Clusters is set as three by default. This can be changed as per the requirements. A minimum of 2 clusters and a maximum of 30 clusters can be specified for clustering. Ensure to consider the number of data points while deciding on the number of clusters. The total number of clusters should be at least one less than the number of data points.
  8. Choose the Normalization method that should be used to transform the data before clustering. This is done to prevent factors with large ranges or different scales from dominating the results, ensuring that all factors contribute equally to the analysis. Normalization is the technique used to transform the data to have a common scale. 
    • Min-Max Scale - In this method, all the data points are rescaled between the ranges of 0 and 1.

      Where,    X is the current value
                      Xmin  is the minimum value in the data
                      Xmax is the maximum value in the data
    • Z-Score -represents the number of standard deviations a data point is from the mean of the dataset. This transformation is designed to center the data around a mean of 0 and standardize it so that the standard deviation is 1 for the entire dataset.


      Where, 
                 X is the current value
                 µ is the mean
                 σ is the standard deviation

  9. For the K- prototype model, specify the Weightage to be given for numerical and categorical factors. The weightage values range from 0.1 to 2. By default, Zoho Analytics gives equal weightage (1.00) to both numerical and categorical factors. Weightage value closer to 0.1 indicates, numerical columns will have more weightage, and value closer to 2 indicates, categorical columns will have more weightage.
  10. Click Apply.

    Note: Rows with missing values are not considered for cluster analysis. Such data points are indicated in gray as Not Clustered in the legend.

In the below image, the data points are clustered based on the K-means algorithm.

Cluster Analysis Information

This section provides the model used for cluster analysis and statistical information to evaluate the quality of the clusters. The Clusters Info option is enabled once the cluster analysis is applied to a report.

Summary 

The Summary section lists the input details that were used to create clusters such as,

  • The Model used for clustering, 
  • The total Number of Clusters created and the Number of data points.
  • The Stop Criteria field gives the reason based on which the clustering process was completed. Cluster Change is the default method that is used in this process.
  • The Distance formula that was used for clustering. 
    • Euclidean distance is the method used for the K-means algorithm.
    • Binary distance is the method used for the K-modes algorithm.
    • Euclidean and Binary distance is the method used for k-prototype.
  • The Factors used for cluster analysis.
  • The Normalization method applied for scaling the data.

Performance or Quality Indicators

What are the indicators of a high quality cluster analysis?

High Intra-cluster Similarity: Data points within each cluster are very similar to each other based on the chosen distance metric, resulting in tight, well-defined clusters.

Low Inter-cluster Similarity: Clusters are well-separated, with significant differences between data points in different clusters, indicating clear boundaries.

Zoho Analytics uses various statistical methods to evaluate the quality of clusters. 

CH Index - The Calinski-Harabasz index (CHI) is a metric used to evaluate the quality of a cluster. The CHI calculates the ratio of between-cluster variance to within-cluster variance. A higher CHI value denotes that the clusters are well grouped.

 

Where:

  • SS B is the sum of squares between clusters
  • SS W is the sum of squares within clusters
  • n is the total number of data points
  • k is the number of clusters

DB Index - The Davies-Bouldin index (DBI) is another metric to assess the quality of clustering. The DBI is calculated by considering the ratio of the average similarity between each cluster and its most similar cluster, to the average dissimilarity between the clusters.

CHI Index and DB Index are calculated for the K-means and K-prototype clustering models.

Purity - In K-mode clustering, purity is a measure used to evaluate the quality of the clustering results. It assesses how well-defined and internally consistent the clusters are by comparing the dominant class labels within each cluster to the actual class labels in the dataset. 

The purity score is given in percentage. A higher purity percentage strongly indicates that the clusters are well-defined.

The Purity indicator is calculated for K-modes and K-prototype clustering models.

Analysis of Variance (ANOVA)

Analysis of variance is calculated only for the K-means algorithm. ANOVA is used to evaluate whether the centroids (or means) of the clusters are significantly different from each other in terms of the values of the factors used for clustering. It is also a statistical significance test that is used to check whether the null hypothesis can be rejected or not during hypothesis testing.

  • Within the Sum of Squares - It calculates how much the individual data points within each group differ from the mean of that group. This can be called the Mean Square between the Clusters (MSB).
  • Between the Sum of Squares - It calculates how much the mean values of different groups differ from the overall mean value. This can be called the Mean Square within the Clusters (MSW).
  • F- Statistic Value -The F-Statistic calculates the ratio of the Mean Square Between (MSB) the clusters to the Mean Square Within (MSW) the clusters. If the F-Statistic is greater than the critical value, we can conclude that the data points are well clustered.
  • P- Value - It helps to decide whether the differences between groups are likely to have occurred by chance or if they are statistically significant.

Factors

F-Statistic

Between the Sum of Squares

Degrees of Freedom (between)

Within the Sum of Squares

Degrees of Freedom (within)

Columns used for clustering

MSB/MSW

Calculates the difference between the means across different clusters 

A large value indicates that the data points are well clustered and there is no overlapping.

k-1

The between-group degrees of freedom is calculated based on the number of clusters (groups) being compared.

Calculates the difference between the means within each cluster 

N-k

The within-cluster degrees of freedom is calculated based on the number of observations within each cluster and the number of clusters.

Clustering Models 

Cluster Analysis partitions the data points into clusters or groups using specific models or algorithms. Each cluster contains data points that are more similar to each other than to those in other clusters. 

The models used for clustering depends on the factors or the conditions used for the analysis. K-means, K-modes, and K-prototypes are the models used for clustering in Zoho Analytics. 

The objective of the centroid based models are,

  • To group data points that have the same characteristics.
  • To minimize the intra-cluster distances, that is to reduce the distance between the individual points in a cluster and its centroid.
  • To maximize the inter-cluster distances so that the data points from different clusters are far apart, making the clusters distinct from each other.

The below table lists all the models, the method used for clustering, and when it can be used.

 

K-means

K-modes

K-prototype

Clustering Methodology

Centroid based clustering

Centroid based clustering

Centroid based clustering

Distance method used

Euclidean distance 

Binary distance 

Combination of both Euclidean distance and binary distance

Applicable for

Measure or Metrics column

Dimension or categorical columns

Both measure and dimension columns

What is a Centroid?

In cluster analysis, a centroid is the central point of a cluster, representing the average position of all the data points within that cluster.

  • K-means - The centroid is the arithmetic mean of all data points in the cluster, calculated by averaging the coordinates of the points.
  • K-modes - The centroid is the mode (most frequent value) for each dimensional attribute within the cluster.
  • K-prototypes: The centroid combines both measure and dimensional factors, using the mean for measure factors and the mode for dimensional factors.

Understanding the K-means Model

Zoho Analytics uses the K-means model for clustering measures or metric factors. The K-means model clusters data points into the pre-defined number of clusters based on the centroid. It locates centers through an iterative procedure.

How does the K-means model work?

Initialization

The model starts by picking k random data points and define them as the centroid for each cluster.

Calculate the distance and assign data points to the nearest cluster

The Euclidean distance between each data point and the centroids are calculated.

Given below is the Euclidean distance,

 

 The data points are then assigned to nearest cluster centroid where the distance between the data point and the centroid is minimum.

Re-defining the Centroids

Once all the data points have been assigned to a cluster, the centroid of the cluster is recomputed by taking the average mean of all the points in the clusters. The Euclidean distance is again calculated for all the data points with the centroids. The data points are then re-assigned to its closest centroid.

 

Cluster Stabilization

The above process is repeated until the clusters stabilize, that is the centroids no longer change and data points cannot be re-assigned.

Understanding the K-modes model

Zoho Analytics uses the K-modes model for clustering measures or metric factors. The K-modes model clusters data points into the pre-defined number of clusters based on the centroid. Unlike the K-means algorithm that calculates distance based on the Euclidean distance, K-modes uses the binary distance method for assigning data points to the clusters. K-modes model uses the mode, the most frequently occurring value as centroids. 

The binary distance formula is given by

How does the K-modes model work?

Initialization

The model starts by picking k random data points and define them as the centroid for each cluster.

Calculate the Dissimilarities and Assign the Values to the Nearest Cluster

Binary distance for the values in each factor and the selected centroids (mode) is calculated. The values are then assigned to the nearest cluster that has the least dissimilarity.

Re-defining Modes

Once all the values are assigned, new modes are defined for the clusters. The modes are updated based on the most frequent value. Re-compute the dissimilarity (binary distance) of each value to the modes and assign values to the nearest cluster.

Cluster Stabilization

The above process is repeated until the clusters stabilize, that is the centroids no longer change and data points cannot be re-assigned.

Understanding the K-prototype model

The K-prototype model is used for clustering factors containing both measure and dimensional columns. K-prototype combines both K-means and K-modes models for computing the centroids. Weightage parameters are used to define and control the dominance of the measure and dimensional columns.

The combined distance formula is given by

How does the K-prototype model work?

Initialization

The model starts by picking k random data points and define them as the centroid for each cluster. 

Calculate the combined distance and assign values to the nearest cluster

Combined distance formula is calculated, and each data point is assigned to a nearest cluster.

Re-defining the centroids

Once all the data points have been assigned to the cluster, the combined distance formula is again calculated for all the data points with the centroids. The data points are then re-assigned to its closest centroid.

Cluster Stabilization

The above process is repeated until the clusters stabilize, that is the centroids no longer change and data points cannot be re-assigned.

Export Cluster Analysis Reports

Once the data points are clustered, you can export the clustered data and use it to perform ad hoc analysis and compare with different dimensions and metrics. This helps to understand the distribution of data points in each cluster.

Zoho Analytics supports various file formats for exporting data. Each data point in a cluster analysis report receives a cluster assignment during export. You can export it to any text file format to conduct further analysis on the clustered data.

x
Thanks for your interest. We shall get back to you shortly.