Trying to make sense of EconML

John Pentakalos
2 min readMay 21, 2020

In its authors own words: EconML “is a toolkit designed to measure the causal effect of some treatment variable (T) on an outcome variable Y, controlling for a set of features X. I’m trying to make sense of how it could be applied to my Personalization problem: how to estimate the revenue gain generated by a click. Based on the fact that the very first use-case listed on it is: Customer Targeting, I’m guessing there’s gotta be something there. I can fit the reward model problem pretty neatly within the EconML problem setup fairly neatly. We aim to measure the causal effect of personalized ads on average revenue per user, controlling for user features. I found this paper online to try and make some headway in the subject.

Project ALICE: The parent project of EconML

Reinforcement Learning and Causal Models

http://gershmanlab.webfactional.com/pubs/RL_causal.pdf

Reinforcement learning systems are intrinsically linked to causal relationships. These causal relationships can be represented either with a simpler, computationally-cheap model free approach or in a model-based formalized causal model. Within the RL framework there are three key causal relationships that we attempt to learn:

  1. Reward: state, action -> reward
  2. Transition: state, action -> state
  3. Hidden State: state -> observation
Bellman equation: Cumulative reward function with the optimal decision policy

The value function that articulates maximized cumulative reward can be read as: immediate reward (first term) and expected future reward(second term). The argmax a’ returns the optimal learning policy. A model-based approach for learning Q(s, a) directly grinds through the recursive Bellman’s equation from above. Most sophisticated approaches use some tree-search to accelerate this approach to be viable in a large action space.

The simpler model-free approach estimates by experience, instead of planning out the actual path of the actions and rewards, it samples many previous rewards from previous states to make a prediction for future reward. Temporal Difference learning can be applied here to gradually improve the prediction when there’s some bias in the prediction error.

There also exists a system where the model-free system and model-based system can work collaboratively. The Model-based system can be used to funnel states likely to reached by the agent and feeds this information to the model-free system. The model-free system takes the given set of likely states and predicts an expected reward using previous samples of reward.

--

--