Tuesday, July 1, 2025

Prescriptive Modeling Makes Causal Bets – Whether or not You Comprehend it or Not!


modeling is the head of analytics worth. It doesn’t deal with what occurred, and even what will occur – it takes analytics additional by telling us what we must always do to alter what will occur. To harness this further prescriptive energy, nevertheless, we should tackle a further assumption…a causal assumption. The naive practitioner might not be conscious that transferring from predictive to prescriptive comes with the luggage of this lurking assumption. I Googled ‘prescriptive analytics’ and searched the primary ten articles for the phrase ‘causal.’ To not my shock (however to my disappointment), I didn’t get a single hit. I loosened the specificity of my phrase search by attempting ‘assumption’ – this one did shock me, not a single hit both! It’s clear to me that that is an under-taught element of prescriptive modeling. Let’s repair that!

While you use prescriptive modeling, you’re making causal bets, whether or not you already know it or not. And from what I’ve seen this can be a terribly under-emphasized level on the subject given its significance.

By the tip of this text, you should have a transparent understanding of why prescriptive modeling has causal assumptions and how one can determine in case your mannequin/strategy meets them. We’ll get there by protecting the matters under:

  1. Temporary overview of prescriptive modeling
  2. Why does prescriptive modeling have a causal assumption?
  3. How do we all know if now we have met the causal assumption?

What’s Prescriptive Modeling?

Earlier than we get too far, I need to say that that is not an article on prescriptive analytics – there may be loads of details about that somewhere else. This portion shall be a fast overview to function a refresher for readers who’re already at the least considerably aware of the subject.

There’s a broadly identified hierarchy of three analytics sorts: (1) descriptive analytics, (2) predictive analytics, and (3) prescriptive analytics.

Descriptive analytics seems to be at attributes and qualities within the information. It calculates developments, averages, medians, normal deviations, and so forth. Descriptive analytics doesn’t try and say something extra in regards to the information than is empirically observable. Typically, descriptive analytics are present in dashboards and stories. The worth it offers is in informing the consumer of the important thing statistics within the information.

Predictive analytics goes a step past descriptive analytics. As a substitute of summarizing information, predictive analytics finds relationships within the info. It makes an attempt to separate the noise from the sign in these relationships to seek out underlying, generalizable patterns. From these patterns, it may well make predictions on unseen information. It goes additional than descriptive analytics as a result of it offers insights on unseen information, fairly than simply the info which might be instantly noticed.

Prescriptive analytics goes a further step past predictive analytics. Prescriptive analytics makes use of fashions created by predictive analytics to suggest sensible or optimum actions. Typically, prescriptive analytics will run simulations by predictive fashions and suggest the technique with probably the most fascinating consequence.

Let’s contemplate an instance to raised illustrate the distinction between predictive and prescriptive analytics. Think about you’re a information scientist at an organization that sells subscriptions to on-line publications. You have got developed a mannequin that predicts that chance {that a} buyer will cancel their subscription in a given month. The mannequin has a number of inputs, together with promotions despatched to the client. Up to now, you’ve solely engaged in predictive modeling. Someday, you get the brilliant concept that you must enter completely different reductions into your predictive mannequin, observe the impression of the reductions on buyer churn, and suggest the reductions that greatest stability the price of the low cost with the advantage of elevated buyer retention. Along with your shift in focus from prediction to intervention, you might have graduated to prescriptive analytics!

Beneath are examples of potential analyses for the client churn mannequin for every degree of analytics:

Examples of analytical approaches in buyer churn – picture by writer

Now that we’ve been refreshed on the three sorts of analytics, let’s get into the causal assumption that’s distinctive to prescriptive analytics.

The Causal Assumption in Prescriptive Analytics

Shifting from predictive to prescriptive analytics feels intuitive and pure. You have got a mannequin that predicts an vital consequence utilizing options, a few of that are in your management. It is sensible to then simulate manipulating these options to drive in direction of a desired consequence. What doesn’t really feel intuitive (at the least to a junior modeler) is that doing so strikes you right into a harmful house in case your mannequin hasn’t captured the causal relationships between the goal variable and the options you plan to alter.

We’ll first present the hazards with a easy instance involving a rubber duck, leaves and a pool. We’ll then transfer on to real-world failures which have come from making causal bets once they weren’t warranted.

Leaves, a pool and a rubber duck

You take pleasure in spending time outdoors close to your pool. As an astute observer of your atmosphere, you discover that your favourite pool toy – a rubber duck – is usually in the identical a part of the pool because the leaves that fall from a close-by tree.

Leaves and the pool toy are usually in the identical a part of the pool – picture by writer

Finally, you determine that it’s time to clear the leaves out of the pool. There’s a particular nook of the pool that’s best to entry, and also you need all the leaves to be in that space so you possibly can extra simply accumulate and discard them. Given the mannequin you might have created – the rubber duck is in the identical space because the leaves – you determine that it will be very intelligent to maneuver the toy to the nook and watch in delight because the leaves comply with the duck. Then you’ll simply scoop them up and proceed with the remainder of the day, having fun with your newly cleaned pool.

You make the change and really feel like a idiot as you stand within the nook of the pool, proper over the rubber duck, internet in hand, whereas the leaves stubbornly keep in place. You have got made the horrible mistake of utilizing prescriptive analytics when your mannequin doesn’t move the causal assumption!

transferring duck doesn’t transfer leaves- picture by writer

Perplexed, you look into the pool once more. You discover a slight disturbance within the water coming from the pool jets. You then determine to rethink your predictive modeling strategy utilizing the angle of the jets to foretell the placement of the leaves as an alternative of the rubber duck. With this new mannequin, you estimate how it’s essential to configure the jets to get the leaves to your favourite nook. You progress the jets and this time you’re profitable! The leaves drift to the nook, you take away them and go on along with your day a better information scientist!

It is a quirky instance, nevertheless it does illustrate a couple of factors properly. Let me name them out.

  • The rubber duck is a basic ‘confounding’ variable. It is usually affected by the pool jets and has no impression on the placement of the leaves.
  • Each the rubber duck and the pool jet fashions made correct predictions – if we merely needed to know the place the leaves had been, they may very well be equivalently good.
  • What breaks the rubber duck mannequin has nothing to do with the mannequin itself and all the things to do with the way you used the mannequin. The causal assumption wasn’t warranted however you moved ahead anyway!

I hope you loved the whimsical instance – let’s transition to speaking about real-world examples.

Shark Tank Pitch

In case you haven’t seen it, Shark Tank is a present the place entrepreneurs pitch their enterprise concept to rich buyers (known as ‘sharks’) with the hopes of securing funding cash.

I used to be not too long ago watching a Shark Tank re-run (as one does) – one of many pitches within the episode (Season 10, Episode 15) was for an organization known as GoalSetter. GoalSetter is an organization that enables dad and mom to open ‘mini’ financial institution accounts of their youngster’s title that household and pals could make deposits into. The concept is that as an alternative of giving toys or reward playing cards to youngsters as presents, folks may give deposit certificates and kids can save up for issues (‘objectives’) they need to buy.

I’ve no qualms with the enterprise concept, however within the presentation, the entrepreneur made this declare:

…children who’ve financial savings accounts of their title are six instances extra more likely to go to varsity and 4 instances extra more likely to personal shares by the point they’re younger adults…

Assuming this statistic is true, this assertion, by itself, is all high quality and properly. We are able to take a look at the info and see that there’s a relationship between a baby having a checking account of their title and going to varsity and/or investing (descriptive). We may even develop a mannequin that predicts if a baby will go to varsity or personal shares utilizing checking account of their title as a predictor (predictive). However this doesn’t inform us something about causation! The funding pitch has this delicate prescriptive message – “give your child a GoalSetting account and they are going to be extra more likely to go to varsity and personal shares.” Whereas semantically just like the quote above, these two statements are worlds aside! One is an announcement of statistical proven fact that depends on no assumptions, and the opposite is a prescriptive assertion that has a large causal assumption! I hope that confounding variable alarms are ringing in your head proper now. It appears a lot extra seemingly that issues like family earnings, monetary literacy of oldsters and cultural influences would have a relationship with each the chance of opening a checking account in a baby’s title and that youngster going to varsity. It doesn’t appear seemingly that giving a random child a checking account of their title will improve their probabilities of going to varsity. That is like transferring the duck within the pool and anticipating the leaves to comply with!

Studying Is Elementary Program

Within the Sixties, there was a government-funded program known as ‘Studying is Elementary (RIF).’ A part of this program targeted on placing books within the houses of low-income youngsters. The objective was to extend literacy in these households. The technique was partially primarily based on the concept that houses with extra books in them had extra literate youngsters. You may know the place I’m going with this one primarily based on the Shark Tank instance we simply mentioned. Observing that houses with a number of books have extra literate youngsters is descriptive. There’s nothing flawed with that. However, if you begin making suggestions, you step out of descriptive house and leap into the prescriptive world – and as we’ve established, that comes with the causal assumption. Placing books in houses assumes that the books trigger the literacy! Analysis by Susan Neuman discovered that placing books in houses was not adequate in growing literacy with out extra sources1.

After all, giving books to youngsters who can’t afford them is an effective factor – you don’t want a causal assumption to do good issues 😊. However, if in case you have the particular objective of accelerating literacy, you’d be well-advised to evaluate the validity of the causal assumption behind your actions to appreciate your required outcomes!

How do we all know if we fulfill the causality assumption?

We’ve established that prescriptive modeling requires a causal assumption (a lot that you’re in all probability exhausted!). However how can we all know if the idea is met by our mannequin? When eager about causality and information, I discover it useful to separate my ideas between experimental and observational information. Let’s undergo how we are able to really feel good (or perhaps at the least ‘okay’) about causal assumptions with these two sorts of information.

Experimental Knowledge

When you have entry to good experimental information in your prescriptive modeling, you’re very fortunate! Experimental information is the gold normal for establishing causal relationships. The main points of why that is the case are out of scope of this text, however I’ll say that the randomized project of remedies in a well-designed experiment offers with confounders, so that you don’t have to fret about them ruining your informal assumptions.

We are able to prepare predictive fashions on the output of an excellent experiment – i.e., good experimental information. On this case, the data-generating course of meets causal identification situations between the goal variables and variables that had been randomly assigned remedies. I need to emphasize that solely variables which might be randomly assigned within the experiment will qualify for the causal declare on the idea of the experiment alone. The causal impact of different variables (known as covariates) might or might not be accurately captured. For instance, think about that we ran an experiment that randomly offered a number of crops with numerous ranges of nitrogen, phosphorus and potassium and we measured the plant development. From this experimental information, we created the mannequin under:

instance mannequin from plant experiment – picture by writer

As a result of nitrogen, phosphorus and potassium had been remedies that had been randomly assigned within the experiment, we are able to conclude that betas 1 by 3 estimate a causal relationship on plant development. Solar publicity was not randomly assigned which prevents us from claiming a causal relationship by the ability of experimental information. This isn’t to say {that a} causal declare might not be justified for covariates, however the declare would require extra assumptions that we are going to cowl within the observational information part developing.

I’ve used the qualifier good when speaking about experimental information a number of instances now. What’s a good experiment? I’ll go over two widespread points I’ve seen that stop an experiment from creating good information, however there may be much more that may go flawed. You need to learn up on experimental design if you need to go deeper.

Execution errors: This is likely one of the commonest points with experiments. I used to be as soon as assigned to a undertaking a couple of years in the past the place an experiment was run, however some information had been combined up concerning which topics received which remedies – the info was not usable! If there have been vital execution errors you could not have the ability to draw legitimate causal conclusions from the experimental information.

Underpowered experiments: This could occur for a number of causes – for instance, there might not be sufficient sign coming from the remedy, or there might have been too few experimental items. Even with good execution, an underpowered examine might fail to uncover actual results which may stop you from assembly the causal conclusion required for prescriptive modeling.

Observational Knowledge

Satisfying the causal assumption with observational information is way more tough, dangerous and controversial than with experimental information. The randomization that may be a key half in creating experimental information is highly effective as a result of it removes the issues attributable to all confounding variables – identified and unknown, noticed and unobserved. With observational information, we don’t have entry to this extraordinarily helpful energy.

Theoretically, if we are able to accurately management for all confounding variables, we are able to nonetheless make causal claims with observational information. Whereas some might disagree with this assertion, it’s broadly accepted in precept. The true problem lies within the software.

To accurately management for a confounding variable, we have to (1) have high-quality information for the variable and (2) accurately mannequin the connection between the confounder and our goal variable. Doing this for every identified confounder is tough, nevertheless it isn’t the worst half. The worst half is that you would be able to by no means know with certainty that you’ve got accounted for all confounders. Even with sturdy area data, the likelihood that there’s an unknown confounder “on the market” stays. The perfect we are able to do is embody each confounder we are able to consider after which depend on what is named the ‘no unmeasured confounder’ assumption to estimate causal relationships.

Modeling with observational information can nonetheless add a whole lot of worth in prescriptive analytics, though we are able to by no means know with certainty that we accounted for all confounding variables. With observational information, I consider the causal assumption as being met in levels as an alternative of in a binary style. As we account for extra confounders, we seize the causal impact higher and higher. Even when we miss a couple of confounders, the mannequin should still add worth. So long as the confounders don’t have too giant of an impression on the estimated causal relationships, we might be able to add extra worth making selections with a barely biased causal mannequin than utilizing the method we had earlier than we used prescriptive modeling (e.g., guidelines or intuition-based selections).

Having a practical mindset with observational information will be vital since (1) observational information is cheaper and way more widespread than experimental information and (2) if we depend on hermetic causal conclusions (which we are able to’t get with observational information), we could also be leaving worth on the desk by ruling out causal fashions which might be ‘ok’, although not good. You and your corporation companions need to determine the extent of leniency to have with assembly the causal assumption, a mannequin constructed on observational information may nonetheless add main worth!

Wrapping it up

Whereas prescriptive analytics is highly effective and has the potential so as to add a whole lot of worth, it depends on causal assumptions whereas descriptive and predictive analytics don’t. It is very important perceive and to fulfill the causal assumption in addition to potential.

Experimental information is the gold normal of estimating causal relationships. A mannequin constructed on good experimental information is in a powerful place to fulfill the causal assumptions required by prescriptive modeling.

Establishing causal relationships with observational information will be harder due to the potential of unknown or unobserved confounding variables. We must always stability rigor and pragmatism when utilizing observational information for prescriptive modeling – rigor to consider and try to manage for each confounder potential and pragmatism to know that whereas the causal results might not be completely captured, the mannequin might add extra worth than the present decision-making course of.

I hope that this text has helped you acquire a greater understanding of why prescriptive modeling depends on causal assumptions and how one can tackle assembly these assumptions. Completely satisfied modeling!

  1. Neuman, S. B. (2017). Principled Adversaries: Literacy Analysis for Political Motion. Academics School Document, 119(6), 1–32.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

PHP Code Snippets Powered By : XYZScripts.com