Definition of Customer Segments
Customer segmentation has undoubtedly been one of the most implemented applications in data analytics since the birth of customer intelligence and CRM data.
The concept is simple. Group your customers together based on some criteria, such as revenue creation, loyalty, demographics, buying behavior, or any combination of these criteria, and more.
The group (or segment) can be defined in many ways, depending on the data scientist’s degree of expertise and domain knowledge.
- Grouping by rules. Somebody in the company already knows how the system works and how the customers should be grouped together with respect to a given task, e.g. a campaign. A Rule Engine node would suffice to implement this set of experience-based rules. This approach is highly interpretable, but not very portable to new analysis. In the presence of a new goal, new knowledge, or new data the whole rule system needs to be redesigned.
- Grouping as binning. Sometimes the goal is clear and not negotiable. One of the many features describing our customers is selected as the representative one, be it revenues, loyalty, demographics, or anything else. In this case, the operation of segmenting the customers in groups is reduced to a pure binning operation. Here customer segments are built along one or more attributes by means of bins. This task can be implemented easily, using one of the many binner nodes available in KNIME Analytics Platform.
- Grouping with zero knowledge. We can assume that the data scientist frequently does not know enough of the business at hand to build his own customer segmentation rules. In this case, if no business analyst is around to help, he should resolve to a plain blind clustering procedure. The after-work for the cluster interpretation belongs to a business analyst, who is (or should be) the domain expert.
With the set goal of making this workflow suitable for a number of different use cases, we chose the third option.
There are many clustering procedures and KNIME Analytics Platform makes them available in the Node Repository panel, in the category Analytics/Mining/Clustering, e.g. k-Means, nearest neighbors, DBSCAN, hierarchical clustering, SOTA, etc … We went for the most commonly used: the k-Means algorithm.
Read more: KNIME.ORG