By Sasha Musat
Business intelligence (BI) tools such as Tableau are growing in popularity and changing the way data is delivered. Now actuaries can publish interactive dashboards that automatically update in real time. These dashboards are increasing data literacy and encouraging business users to explore the data themselves.
These new tools provide a fantastic opportunity, but they also represent a new frontier. As actuaries, I think we have a professional responsibility to add order to the Wild West. In particular, Precept 8 of the Code of Professional Conduct says that we must take all reasonable steps to ensure that our work products are not used to mislead others. This can be a very difficult task once a dashboard is published, but there are many guardrails that can be put in place to insure that our dashboards are not misinterpreted or don’t misinform.
Additionally, it may not be possible for an actuary to review all of the data in an automatically updating dashboard. If the data is on a 15-minute update schedule or even a live feed it is not possible for a person to manually check all of the data. Because we are responsible for the data and reports we publish, we need to take steps to ensure the data is continuously valid. This article will explore some possible practices for ensuring data integrity and dashboard design. All of the examples are demonstrated in Tableau, but the concepts apply to any BI tool.
Check for Data Integrity
Business intelligence tools allow us to publish a dashboard that can automatically update on its own schedule. While this is great functionality that provides users with timely data, it is also a little scary for the publishers. A dashboard can be fully reviewed, but once it is published we are depending on all of the processes to work perfectly. Even if the BI tool updates perfectly every time, there is still the possibility that the data it’s pulling is not correct—a scheduled database update might not happen on time, for example. To mitigate the risks, publishers can build a set of data integrity tests into the dashboards. As an example, if we are building a dashboard with rolling 12-month data, we know what reasonable ranges for plan membership, claims dollars, and utilization should be—any data that falls outside these ranges would raise a red flag.
So we can write checks that make sure the totals in our data set are reasonable. A membership check could be simply:
Membership Test = if sum([Members]) < 98000 then “Fail” elseif sum([Members])
> 99500 then “Fail” else “Pass” end
In this test, total membership must be between a reasonable expected range
(98,000 and 99,500 in this case). Setting the range will depend on your data’s variability; the range should be small enough to catch data duplication or the loss of a major data set but also large enough that it will not constantly fail.
Another useful check of data completeness is an element count: If you do business in four states, then the data set should include all four states. The format is also straightforward:
Check for State Data = if COUNTD([State]) = 4 then “Pass” else “Fail” end
These checks are useful, but we cannot expect dashboard users to look through a lengthy list of data integrity checks and confirm that the data is good before using the dashboard. We need to aggregate all of the checks and make it really obvious if any of the checks fail.
The checks can be aggregated by writing a formula that sums to 0 if the checks all pass or some other number if there is a failure:
Aggregate the Tests = if [Check for State Data] = “Fail” then 1 else 0 end + if [Check Plan Types] = “Fail” then 1 else 0 end + if [Claims Amount Test] = “Fail” then 1 else 0 end + if [Membership Test] = “Fail” then 1 else 0 end
This test aggregation formula can then be used to change the title of the dashboard. This can be done by making a dashboard title calculation:
Generate Title = if [Aggregate the Tests] = 0 then “Summary of Loss Ratios by Plan Type” else “ERROR: Data Validation has failed. Please contact Jane Smith for assistance.” End
This formula can be placed on a single sheet and used in place of a text box for the dashboard title. A large ERROR instead of a dashboard title will draw the user’s attention and tell them not to use the dashboard.
Build In Visual Failsafes
Data validation is a process that can easily be automated, so now on to the harder part. How do we ensure that users do not misinterpret the dashboard or draw incorrect conclusions? It may be impossible to fully prevent someone from drawing an incorrect conclusion, but there are many dashboard design principles that can be used to prevent this. Some of these practices are:
Use only key performance indicators (KPIs) and language that your users understand. KPIs that your users are familiar with are perfect for self-service tools, but obscure or overly technical metrics such as ANOVA tables should be avoided. Additionally definitions and links can be very useful. The definition of any KPI should be included in the tool tip. Links are also helpful. In Tableau you can add links to any external website.
Here is a summary of hospital compare data. When a user clicks on a measure, they will be taken to the Centers for Medicare & Medicaid Services’ published documentation for that measure.
Axes and color scales should always be locked. For example, if you change the state filter from Washington to Montana, you would see the following:
At first glance it appears that Montana has more members and its gold and platinum products have poor loss ratios. Upon further analysis, though, we see that Tableau automatically rescaled the axes and colors to what was most visually appealing. These tiny changes are difficult to notice if a user is not familiar with how Tableau works. Alternatively we can manually set the y-axis to a range that makes sense for our data and change the color scale to something meaningful:
Here the y-axis is the same on both graphs and color scale is set so that it is white at the loss ratio goal, blue if the ratios are below the goal and red if the ratios are above the goal. Because the color red automatically draws negative attention to something, it is a natural choice to a loss ratio above the goal. Now the picture looks quite different. We can clearly see that Montana has many fewer members then Washington and that its plans are not meeting profitability goals.
Use visualizations that are familiar and easy to understand quickly. For example, below is a spiral chart. This visualization was created by Neil Richards; this chart shows data over time in a spiral instead of a straight line. The center of the spiral is the first time period and the outer edge is the last. This chart is visually intriguing but it takes a while to understand and I believe it is difficult to see trends in compared to a regular line graph.
Complicated visualization types are great for presentations when you can explain them and you really want to wow your audience—but for a self- service tool, I think users will get a lot more out of a simpler chart.
All dashboards should have documentation that helps explain how to use the dashboard. This can be as simple as adding a “click here” image next to a chart that is a filter. Additionally, any filters should apply to the entire dashboard. This is the user’s default assumption. Beyond just documenting your dashboard, you should set up time to walk users through it. A quick training session can clear up any misconceptions.
Data credibility checks should be added to dashboards. When users apply multiple filters to a data set, they can often drill down to only a couple of members. For example:
Here the platinum plan appears to have a really high loss ratio. However it only has 21 members, so a single outlier could be driving the results. A credibility limit can be added in multiple ways. One is to set the color to only appear dark when full credibility is reached.
Here the platinum plan is very light, calling out the fact that it only has a small number of members and that it is less then our credibility threshold of 100.
Additionally we can write a credibility check formula that can be added to the tool tip. The formula would be:
if sum([Members]) < 100 then “not credible” else “credible” end
This tool will help users narrow in on key drivers and not get distracted by non-credible data.
Actuaries have an opportunity to be proactive in this new world of dashboards. I think we should encourage everyone in the company to use data, but I also think we must be shepherds of the dashboards and make sure that they are being used correctly.
Susan Musat, MAAA, ASA, is a health care economics actuarial analyst with Premera.