Too Much Data—Why analytic competition is a problem … and what we can do about it

By Kurt J. Wrobel

We truly live in the information age. We have access to much more information than at any other time in human history—whether it takes the form of videos, pictures, emails, text, or just raw data. We now have access to a vast volume of knowledge by simply typing words into a Google search. Questions that would have been impossible or time-consuming to answer can now be addressed in a few seconds. It is now easy to maintain connections with friends and share the events of our lives through social networking websites. We can make video calls to almost anywhere in the world. Remarkably, this information can be accessed for almost no cost.

Like many endeavors, competition has driven companies and organizations to make improvements to fully optimize the available building blocks from the information age—including access to more information and faster computing power—to produce better products. We see this power of competition all around us. The iPhone 7 is better than the iPhone 6. The Samsung Galaxy phones have improved right along with the iPhone, while Nokia has lost market share to Apple and Samsung. Cars, televisions, and almost every other consumer device has improved as companies have optimized the opportunities brought about by the information age.

The advances created by this competition should be celebrated. We can now live lives that few people could have imagined a few generations ago. With access to cheap information, we can now enrich our lives through inexpensive entertainment and a wide variety of educational content. Beyond personal access to information, these advances have reduced the cost for physical goods, as efficiency gains can now be incorporated into these products. Walmart and other retailers can sell products less expensively by better managing the supply chain through technology. Oil is now less expensive thanks to our ability to explore for new oil reserves by using more information. Similar advances have been made in food production, transportation, and several other important physical good sectors.

But the advances created by the ongoing competition to produce better products have come at a cost. By virtue of increasingly better information-based products, we have now become increasingly distracted with the myriad of options available to us. Multitasking has become rampant. Children and adults alike are choosing video games over physical activity. Online relationships are taking over for community and in-person relationships. Television and internet usage have resulted in total screen time that goes well beyond what was viewed a generation ago. People are far more disconnected from their communities as they seek entertainment that appeals to their unique interests. Social media sites have prompted people to engage in branding campaigns that highlight the positive aspects of their lives and omit the less-than-perfect. For some, the advances in the information age have adversely impacted their health, relationships, and even their willingness to improve their lives.

With these technological advances, consumers and our surrounding culture have begun developing approaches to enjoy the advances of the information age while also working hard to minimize the downside of too much information. We have seen many techniques to control this information overload—including “technology sabbaths,” software applications that limit online access, and practicing mindfulness. Many people are actively trying to manage the information deluge and the unintended consequence of living a life with so much information. As we all know, this management is not easy.

Companies have been able to better segment customers with website views and promotional coupons; baseball players are more accurately valued; and voters are more effectively targeted. The power of data is seemingly limitless.

Application to Business Analytics and Quantitative Research

Consistent with the broader societal trends in using more information and greater computing power, we have seen profound change in analytical work and research. Data and information have become widely available across companies, organizations, universities, and individuals. The data sets are rich, widely available, and much easier to access. Data has become democratized, with almost any private individual or employee having access to large data sets.

Along with more data, we now have the computing power to engage in increasingly sophisticated analysis of that data. With the days of the IBM punch cards long gone, we can now show correlations between an increasing number of variables with relative ease using massive data sets.

With the continued improvement in data and computing power, companies and governments have started to increase their expectation of what can be done with this data. Managers are demanding improved customer targeting programs. More accurate predictions of the future are demanded. With greater computing power and more data, governments are creating increasingly complex public programs. University researchers are using data to analyze complex government policies and the actions of companies and individuals.

The stories abound in the media and popularized books on the power of data. Companies have been able to better segment customers with website views and promotional coupons; baseball players are more accurately valued; and voters are more effectively targeted. The power of data is seemingly limitless.

Analytic Competition

With the building blocks to conduct more extensive research through more data or more sophisticated modeling, the competition to meet these demands has also increased. Similar to many of the other advances that have taken advantage of more data and greater computing power, this competition has produced meaningful results—particularly when this competition produces useful real-time data or information that can be used to predict the outcome of a simple system.

Information that is immediately useful: In this particular case, the data gathering incorporates disparate sources of data and then highlights the opportunity with no forecasting required. A simple example would be a website that aggregates price data from several sources and then highlights the best possible price. GPS location information is another such example. In these cases, the output of the data is simple and immediately useful to the end-user.

A special and well-known case study would be the oft-cited book Moneyball. In the book, the author highlights how many teams overvalued batting average and undervalued on-base percentage (a metric that includes walks). Using this arbitrage opportunity, the Oakland A’s were able to sign and draft players who were effectively undervalued by the rest of the market. While this data analysis story is often cited in the Big Data literature, the success and insight were largely driven by a simple and obvious oversight in how baseball teams evaluated the importance of walks—an oversight that was discussed in sabremetrics circles well before the strategy was used by the A’s. As a result, the actual Big Data analysis involved simply gathering information on a player’s on-base percentage and then developing a strategy to exploit the arbitrage opportunity.

Predicting simple systems: Beyond real-time information that is immediately useful, with the advances in data and modeling, analysts have developed more sophisticated models and have used more extensive data sets to predict the future of simple systems—usually the buying or voting behavior of individuals with a simple choice to purchase a product or vote for a candidate. In the classic cases, a data scientist will optimize the presentation of a webpage or promotion to better target an individual’s taste based on a wide range of factors. In addition to using large data sets to predict simple models, in this approach, the cost of any mistake is small—the incremental reduction in the likelihood of purchasing a product or any one individual voting for a candidate. As a result, data analysts can usually rely on mere correlation as a driver of a decision rather than consider whether one variable caused another.

In the above cases, the analytic competition to create more sophisticated models and more data has the potential to produce better outcomes. These cases also often serve as the case studies highlighting the power of Big Data. That said, while the success stories often emphasize the accuracy of these new approaches, they rarely highlight the incremental cost of adding more data into the analysis, nor the risk management and IT costs associated with obtaining and analyzing this data.

The Downside to Analytic Competition:
Predicting and Explaining Complex Systems

The problem with analytical competition occurs when analysts attempt to predict the future or explain the result of a complex system that goes well beyond simple systems or valuable real-time information. These complex systems could include the economy, stock prices, housing prices, and health care costs in a rapidly changing insurance environment.

In these cases, additional information and computing power has the potential to improve strategic decision-making, but the process is not consistent and will likely need to include combining technical skill with many qualitative considerations. This inherent subjectivity in how to conduct an analysis allows individuals to develop conclusions using large data sets and computing power that benefit their preferred narrative—and improve their career prospects. These narratives could include stories that take credit or ascribe blame to past decisions, highlight preferred future strategies, or emphasize an intellectually appealing modeling technique. No matter the rationale, the internal competition among individuals to differentiate themselves often leads to biased analysis that accomplishes little to improve an organization’s decision-making process.

While an analyst could actively influence the conclusions of a complex modeling project, in many cases, one could also simply believe the results because it is in one’s best interest to promote a particular narrative. In fact, as suggested in the behavioral economics literature, it is in our very nature to either skew results to confirm a long-held belief (confirmation bias) or disbelieve facts that threaten an individual’s career interest (illusion of skill).

Specific challenges that could lead to less-than-optimal decision-making include:

Rote memorization. The most simple, but often the most effective, is rote memorization. The ability to memorize and then put information into a simplified story is often the best approach for articulating a business story involving a complex system. It gives the appearance of mastery and distinguishes those people who have learned the art of storytelling through large data sets where numerous pieces of information could be made useful within a broader story or answer a succinct business question. The problem is that rote memorization often does not include a deep understanding of the underlying system and the potential variability in an observed result.

Cherry-picked data. An alternative but nearly as simple approach is “cherry-picking” the data to present an incomplete, biased presentation of information. This tried-and-true method has been used for as long as data has been available, but the opportunities are now even more plentiful because of the large increase in the volume of the data and the ability for individuals to manipulate the data to highlight their preferred narrative.

Technically sophisticated, but biased predictions. While rote memorization and cherry-picking are available to most people, the strategies available to the more technically minded include much more sophisticated modeling approaches. The list is long, but one of the most widely used practices is to calibrate a regression model with handpicked explanatory variables that achieve the desired statistical conclusion for the model. The strategies can differ, but the approaches suffer from many of the same problems associated with cherry-picking. A skilled technical analyst can skew the data to present the data that highlights his or her preferred narrative.

Complexity promoters. With the increase in the availability of data, many people have furthered their careers and advanced their consulting opportunities by advocating for complex, analytically driven solutions. In many cases, these solutions add precious little to the predictive power, but instead allow individuals to further their career interests by becoming the only person with the knowledge to adequately understand a complex model.

Observation overload without a consideration for the management of an organization. With the increase in the volume of data, analysts can increase the number of observations and suggest a wide range of potential solutions or improvements. While these suggestions can be helpful, in many cases, this approach can lead an organization to lack focus as it attempts to address all the opportunities rather than the most impactful initiative.

Taken in total, the competition to use large data sets and sophisticated computing power has led, in many cases, to biased analysis and ultimately less-effective decision-making. The structural challenge is that most quantitative work in complex systems can be conducted with a variety of techniques such that individuals can differentiate themselves based on the conclusions of their models or their analytical techniques, rather than the quality of the decision-making process.

The Solution to Analytic Competition:
Analytic Fundamentals and Teams

The deluge of information and computing power combined with analytic competition have blinded many leaders to important qualitative factors that are important in making decisions and has distracted them from the true task at hand—developing well-reasoned decisions that incorporate all the available data to make the best possible decision.

The attention put on computationally sophisticated models and Big Data has shifted the focus from the most fundamental aspects of analytic decision-making: how to structure relevant business questions, understanding the source and collection process of the data, determining when a model is useful and when additional complexity is worth the cost, understanding the incentive structure of those providing data, and holistically analyzing risk and uncertainty using qualitative factors beyond additional data and modeling.

In addition to these fundamental aspects of analytical decisions, the power of technical teams to check individual incentives is often not explicitly considered as a strategy to avoid biased analysis. By ensuring a work product comes from a group, it is much more likely to represent an honest conclusion rather than one impacted by the individual incentives of any one person.

The steps necessary to ensure effective decision-making in a team environment are not different than the basis of good management—active listening, ensuring an environment of collaboration, constant questioning, guarding against groupthink and excessive courtesy, and allowing sufficient time to complete a project.

For many organizations where conclusions must be reached on complex systems, in the absence of sound analytic fundamentals and strong, well-reasoned management, the promise of Big Data has led to poor decision-making and an enormous waste of resources as individuals use data to foster their careers at the expense of improving the entire organization.

In many respects, organizations must begin making the changes that many people are trying to do in their personal lives—learn how to use information to improve their situation without falling for the many pitfalls of too much information.

KURT J. WROBEL is chief financial officer and chief actuary at Geisinger Health Plan.

Too Much Data—Why analytic competition is a problem … and what we can do about it

Related posts

Dollars and Sense: Money, long-term high unemployment, and universal basic income

Insurance as Experience: A New Paradigm—Atomization of traditional offerings is allowing an unprecedented level of personalization

A Rigged Game?: Human nature and the U.S. health care system