Find the right market research agencies, suppliers, platforms, and facilities by exploring the services and solutions that best match your needs

list of top MR Specialties

Advertising Research B2B Market Research Consumer Market Research Customer Satisfaction Data Collection - Field Services Ethnography Focus Group Facilities Focus Group Moderators Focus Group Recruiting

Browse all specialties

Browse Companies and Platforms

by Specialty

by Location

by Name

Browse Focus Group Facilities

by Location

by Name

Manage your listing

Follow a step-by-step guide with online chat support to create or manage your listing.

List your company Renew your listing

About Greenbook Directory

Events

IIEX Conferences

Discover the future of insights at the Insight Innovation Exchange (IIEX) event closest to you

IIEX Virtual Events

Explore important trends, best practices, and innovative use cases without leaving your desk

Insights Tech Showcase

See the latest research tech in action during curated interactive demos from top vendors

View all showcases

Webinars

Stay updated on what’s new in insights and learn about solutions to the challenges you face

View all webinars

Insights

Reports

Community

Greenbook Future list

An esteemed awards program that supports and encourages the voices of emerging leaders in the insight community.

Insight Innovation Competition

Submit your innovation that could impact the insights and market research industry for the better.

Job Board

Find your next position in the world's largest database of market research and data analytics jobs.

Become a Contributor

For Suppliers

Directory: Renew your listing

Directory: Create a listing

Event sponsorship

Get Recommended Program

Digital Ads

Content marketing

Ads in Reports

Podcasts sponsorship

Run your Webinar

Host a Tech Showcase

Future List Partnership

All services

Let’s talk

Dana Stanley

Greenbook’s Chief Revenue Officer

Insights Home All Topics Expert Channels Webinars Podcast

Sankey Diagrams: A Better Way to Visualize Decision Trees

Sankey diagrams are perfect for displaying decision trees

by Tim Bock

By Tim Bock

I used to think that Sankey diagrams were just one of those cool visualizations that look amazing at first, but then don’t turn out to be useful for any real-world problems. I am happy to report I have this wrong. They are perfect for displaying decision trees (e.g., CHART, CHAID).

Perhaps you have not come across Sankey diagrams before? The most famous of them all, created by Charles Joseph Miniard in 1861, shows the ill-fated march of Napoleon to Moscow and back. The tree-branch-like image that goes across the visualization is proportional to the size of Napoleon’s army. Brown represents the advance of Napoleon, with the army shrinking the closer he gets to Moscow. Black shows the retreat from his Pyrrhic victory.

1280px-minard

A more typical Sankey diagram

A more typical example is the load energy flow Sankey diagram below, which shows UK energy sources and applications.

Cool? Yes. However, I tried to apply these to a whole host of problems, and I kept getting results completely devoid of insight. Then, in an epiphany, which no doubt means that I have stolen the idea from somebody else (perhaps Kent Russell?), it occurred to me that Sankey diagrams are the perfect solution to an age-old visualization problem: how best to represent data from a classification tree.

You can can inspect the code, and play with the examples used in this post by clicking here.

The standard, difficult-to-read, tree output

The tree below is the standard output from the R tree package. This example shows the predictors of whether or not children’s spines were deformed after surgery. The tree predicts the Presence of Absence of deformation based on three predictors:

Start: The number of the topmost vertebra operated upon.
Age: The age in months of the patient.
Number: The number of vertebrae operated upon.

With a bit of effort you can discern from the tree above that it has identified two segments of children for whom the probability is 50% or more:

Start ≤ 12 and Age ≥ 128 and Numbers ≤ 4
Start ≤ 8 and 35 ≤ Age and Number ≥ 5

Compare the meagerness of these findings with what we obtain from the Sankey tree below.

A Sankey tree

The branches are color-coded, on a continuum of blue to red via grey. Blue means 100% chance of a deformity, grey 50% chance, and red means 0% chance. Thus, we can readily conclude the following things which could not be known from the traditional tree. For example:

As most of the visualization is red, most children do not experience a deformity after surgery.
The best indicator of deformity is Start. If Start is 12 or less, the chance of a deformity is comparatively low, except for the small segment for whom Start is either vertebrate 13 and 14, and age is from 60 to 157 months. If you hover your mouse pointer over this node (the second from the top, on the far-right), you will see that only 7 children fit this category, and of these 2 (29%) had a deformity.

TRY IT OUT
You can inspect the code, and play with this example using Displayr.

Acknowledgements

The data used in the Sankey tree is kyphosis from the rpart package. The Sankey tree code was a collaborative effort involving Kent Russell, Michael Wang, Justin Yap, and myself, based on a fork of networkD3, which is itself an HTMLwidget version of Mike Bostock’s D3 Sankey diagram code, which is inspired by Tom Counsell’s Sankey library. The load energy flow example is from networkD3, which is a reworking of a Sankey library example, using data from the UK’s Department of Energy & Climate Change.

Originally posted here.

cognitive psychology data visualization

Tim Bock

26 articles

Disclaimer

The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.

Comments

Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.

Rotate Your Correspondence Analysis to Better Understand Your Brand Positioning

Tim Bock on utilizing correspondence analysis.

Tim Bock

February 19, 2019

Read article

Research Methodologies

Data Visualization for Conjoint Analysis

Visualizations can summarize patterns that are commonly hidden in a simulator

Tim Bock

December 28, 2018

Read article

What’s Better Than Two Pie Charts?

Bad visuals stress the need for charts to be interpretable in seconds

Tim Bock

October 22, 2018

Read article

Insights Industry News

Using “Small Multiples” Visualizations for Big Success

Visualizing data can be made easier by utilizing small charts for comparison and analysis

Tim Bock

August 27, 2018

Read article

ARTICLES

Top in Quantitative Research

Research Methodologies

Moving Away from a Narcissistic Market Research Model

Why are we still measuring brand loyalty? It isn’t something that naturally comes up with consumers, who rarely think about brand first, if at all. Ma...