aVirtualTwins/inst/doc/full-example.html

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />

<meta name="author" content="Francois Vieille" />

<meta name="date" content="2015-07-27" />

<title>Virtual Twins Examples</title>


<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
  margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; }
code > span.dt { color: #902000; }
code > span.dv { color: #40a070; }
code > span.bn { color: #40a070; }
code > span.fl { color: #40a070; }
code > span.ch { color: #4070a0; }
code > span.st { color: #4070a0; }
code > span.co { color: #60a0b0; font-style: italic; }
code > span.ot { color: #007020; }
code > span.al { color: #ff0000; font-weight: bold; }
code > span.fu { color: #06287e; }
code > span.er { color: #ff0000; font-weight: bold; }
</style>
<style type="text/css">
  pre:not([class]) {
    background-color: white;
  }
</style>


<link href="data:text/css,body%20%7B%0A%20%20background%2Dcolor%3A%20%23fff%3B%0A%20%20margin%3A%201em%20auto%3B%0A%20%20max%2Dwidth%3A%20700px%3B%0A%20%20overflow%3A%20visible%3B%0A%20%20padding%2Dleft%3A%202em%3B%0A%20%20padding%2Dright%3A%202em%3B%0A%20%20font%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0A%20%20font%2Dsize%3A%2014px%3B%0A%20%20line%2Dheight%3A%201%2E35%3B%0A%7D%0A%0A%23header%20%7B%0A%20%20text%2Dalign%3A%20center%3B%0A%7D%0A%0A%23TOC%20%7B%0A%20%20clear%3A%20both%3B%0A%20%20margin%3A%200%200%2010px%2010px%3B%0A%20%20padding%3A%204px%3B%0A%20%20width%3A%20400px%3B%0A%20%20border%3A%201px%20solid%20%23CCCCCC%3B%0A%20%20border%2Dradius%3A%205px%3B%0A%0A%20%20background%2Dcolor%3A%20%23f6f6f6%3B%0A%20%20font%2Dsize%3A%2013px%3B%0A%20%20line%2Dheight%3A%201%2E3%3B%0A%7D%0A%20%20%23TOC%20%2Etoctitle%20%7B%0A%20%20%20%20font%2Dweight%3A%20bold%3B%0A%20%20%20%20font%2Dsize%3A%2015px%3B%0A%20%20%20%20margin%2Dleft%3A%205px%3B%0A%20%20%7D%0A%0A%20%20%23TOC%20ul%20%7B%0A%20%20%20%20padding%2Dleft%3A%2040px%3B%0A%20%20%20%20margin%2Dleft%3A%20%2D1%2E5em%3B%0A%20%20%20%20margin%2Dtop%3A%205px%3B%0A%20%20%20%20margin%2Dbottom%3A%205px%3B%0A%20%20%7D%0A%20%20%23TOC%20ul%20ul%20%7B%0A%20%20%20%20margin%2Dleft%3A%20%2D2em%3B%0A%20%20%7D%0A%20%20%23TOC%20li%20%7B%0A%20%20%20%20line%2Dheight%3A%2016px%3B%0A%20%20%7D%0A%0Atable%20%7B%0A%20%20margin%3A%201em%20auto%3B%0A%20%20border%2Dwidth%3A%201px%3B%0A%20%20border%2Dcolor%3A%20%23DDDDDD%3B%0A%20%20border%2Dstyle%3A%20outset%3B%0A%20%20border%2Dcollapse%3A%20collapse%3B%0A%7D%0Atable%20th%20%7B%0A%20%20border%2Dwidth%3A%202px%3B%0A%20%20padding%3A%205px%3B%0A%20%20border%2Dstyle%3A%20inset%3B%0A%7D%0Atable%20td%20%7B%0A%20%20border%2Dwidth%3A%201px%3B%0A%20%20border%2Dstyle%3A%20inset%3B%0A%20%20line%2Dheight%3A%2018px%3B%0A%20%20padding%3A%205px%205px%3B%0A%7D%0Atable%2C%20table%20th%2C%20table%20td%20%7B%0A%20%20border%2Dleft%2Dstyle%3A%20none%3B%0A%20%20border%2Dright%2Dstyle%3A%20none%3B%0A%7D%0Atable%20thead%2C%20table%20tr%2Eeven%20%7B%0A%20%20background%2Dcolor%3A%20%23f7f7f7%3B%0A%7D%0A%0Ap%20%7B%0A%20%20margin%3A%200%2E5em%200%3B%0A%7D%0A%0Ablockquote%20%7B%0A%20%20background%2Dcolor%3A%20%23f6f6f6%3B%0A%20%20padding%3A%200%2E25em%200%2E75em%3B%0A%7D%0A%0Ahr%20%7B%0A%20%20border%2Dstyle%3A%20solid%3B%0A%20%20border%3A%20none%3B%0A%20%20border%2Dtop%3A%201px%20solid%20%23777%3B%0A%20%20margin%3A%2028px%200%3B%0A%7D%0A%0Adl%20%7B%0A%20%20margin%2Dleft%3A%200%3B%0A%7D%0A%20%20dl%20dd%20%7B%0A%20%20%20%20margin%2Dbottom%3A%2013px%3B%0A%20%20%20%20margin%2Dleft%3A%2013px%3B%0A%20%20%7D%0A%20%20dl%20dt%20%7B%0A%20%20%20%20font%2Dweight%3A%20bold%3B%0A%20%20%7D%0A%0Aul%20%7B%0A%20%20margin%2Dtop%3A%200%3B%0A%7D%0A%20%20ul%20li%20%7B%0A%20%20%20%20list%2Dstyle%3A%20circle%20outside%3B%0A%20%20%7D%0A%20%20ul%20ul%20%7B%0A%20%20%20%20margin%2Dbottom%3A%200%3B%0A%20%20%7D%0A%0Apre%2C%20code%20%7B%0A%20%20background%2Dcolor%3A%20%23f7f7f7%3B%0A%20%20border%2Dradius%3A%203px%3B%0A%20%20color%3A%20%23333%3B%0A%7D%0Apre%20%7B%0A%20%20white%2Dspace%3A%20pre%2Dwrap%3B%20%20%20%20%2F%2A%20Wrap%20long%20lines%20%2A%2F%0A%20%20border%2Dradius%3A%203px%3B%0A%20%20margin%3A%205px%200px%2010px%200px%3B%0A%20%20padding%3A%2010px%3B%0A%7D%0Apre%3Anot%28%5Bclass%5D%29%20%7B%0A%20%20background%2Dcolor%3A%20%23f7f7f7%3B%0A%7D%0A%0Acode%20%7B%0A%20%20font%2Dfamily%3A%20Consolas%2C%20Monaco%2C%20%27Courier%20New%27%2C%20monospace%3B%0A%20%20font%2Dsize%3A%2085%25%3B%0A%7D%0Ap%20%3E%20code%2C%20li%20%3E%20code%20%7B%0A%20%20padding%3A%202px%200px%3B%0A%7D%0A%0Adiv%2Efigure%20%7B%0A%20%20text%2Dalign%3A%20center%3B%0A%7D%0Aimg%20%7B%0A%20%20background%2Dcolor%3A%20%23FFFFFF%3B%0A%20%20padding%3A%202px%3B%0A%20%20border%3A%201px%20solid%20%23DDDDDD%3B%0A%20%20border%2Dradius%3A%203px%3B%0A%20%20border%3A%201px%20solid%20%23CCCCCC%3B%0A%20%20margin%3A%200%205px%3B%0A%7D%0A%0Ah1%20%7B%0A%20%20margin%2Dtop%3A%200%3B%0A%20%20font%2Dsize%3A%2035px%3B%0A%20%20line%2Dheight%3A%2040px%3B%0A%7D%0A%0Ah2%20%7B%0A%20%20border%2Dbottom%3A%204

</head>

<body>


<div id="header">
<h1 class="title">Virtual Twins Examples</h1>
<h4 class="author"><em>Francois Vieille</em></h4>
<h4 class="date"><em>2015-07-27</em></h4>
</div>


<div id="introduction" class="section level1">
<h1>Introduction</h1>
<p>The goal of this vignette is to show most of all possibilies with <em>aVT</em> (for <em>aVirtualTwins</em> meaning <em>a</em>daptation of <em>Virtual Twins</em> method) package.</p>
<p><em>VT</em> method (Jared Foster and al, 2011) has been created to find subgroup of patients with enhanced treatment effect, if it exists. Theorically, this method can be used for binary and continous outcome. This package only deals with binary outcome in a two arms clinical trial.</p>
<p><em>VT</em> method is based on random forests and regression/classification trees.</p>
<p>I decided to use a simulated dataset called <em>sepsis</em> in order to show how <em>aVT</em> package can be used. Type <code>?sepsis</code> to know more about this dataset. Anyway, the true subgroup is <code>PRAPACHE &lt;= 26 &amp; AGE &lt;= 49.80</code>.</p>
<p><strong>NOTE:</strong> This true subgroup is defined with the <em>lower</em> event rate (<code>survival = 1</code>) in treatement arm. Therefore in following examples we’ll search the subgroup with the <em>highest</em> event rate, and we know it is <code>PRAPACHE &gt; 26 &amp; AGE &gt; 49.80</code>.</p>
<hr />
</div>
<div id="quick-preview" class="section level1">
<h1>Quick preview</h1>
<div id="dataset" class="section level2">
<h2> Dataset</h2>
<p>Data used in <em>VT</em> are modelized by <span class="math">\(\left\{Y, T, X_1, \ldots, X_{p-2}\right\}\)</span>. <span class="math">\(p\)</span> is the number of variables.</p>
<ul>
<li><span class="math">\(Y\)</span> is a binary outcome. In R, <span class="math">\(Y\)</span> is a <code>factor</code>. Second level of this factor will be the desirable event. (<span class="math">\(Y=1\)</span>)</li>
<li><span class="math">\(T\)</span> is treatment variable, <span class="math">\(T=1\)</span> means <em>active treatement</em>, <span class="math">\(T=0\)</span> means <em>control treatment</em>. In R, <span class="math">\(T\)</span> is numeric.</li>
<li><span class="math">\(X_i\)</span> is covariables, <span class="math">\(X_i\)</span> can be categorical, continous, binary.</li>
</ul>
<p><strong>NOTE:</strong> if you run <em>VT</em> with interactions, categorical covariables must be transformed into binary variables.</p>
<p>Type <code>?formatRCTDataset</code> for details.</p>
<p>Related functions/classes in aVirtualTwins package : <code>VT.object()</code>, <code>vt.data()</code>, <code>formatRCTDataset</code>.</p>
</div>
<div id="method" class="section level2">
<h2>Method</h2>
<p><em>VT</em> is a two steps method but with many possibilities</p>
<p>let <span class="math">\(\hat{P_{1i}} = P(Y_i = 1|T_i = 1, X_i)\)</span><br />let <span class="math">\(\hat{P_{0i}} = P(Y_i = 1|T_i = 0, X_i)\)</span><br />let <span class="math">\(X = \left\{X_1, \ldots, X_{p-2}\right\}\)</span></p>
<div id="first-step" class="section level3">
<h3>First Step</h3>
<ul>
<li>Grow a random forest with data <span class="math">\(\left\{Y, T, X \right\}\)</span>.<br /></li>
<li>Grow a random forest with interaction treatement / covariable, i.e. <span class="math">\(\left\{Y, T, X, XI(T_i=0), XI(T_i=1)\right\}\)</span></li>
<li>Grow two random forests, one for each treatement:
<ul>
<li>The first with data <span class="math">\(\left\{Y, X \right\}\)</span> where <span class="math">\(T_i = 0\)</span><br /></li>
<li>The second with data <span class="math">\(\left\{Y, X \right\}\)</span> where <span class="math">\(T_i = 1\)</span><br /></li>
</ul></li>
<li>Build your own model</li>
</ul>
<p>From one of these methods you can estimate <span class="math">\(\hat{P_{1i}}\)</span> and <span class="math">\(\hat{P_{0i}}\)</span>.</p>
<p>Related functions/classes in aVirtualTwins package : <code>VT.difft()</code>, <code>vt.forest()</code>.</p>
</div>
<div id="second-step" class="section level3">
<h3>Second Step</h3>
<p>Define <span class="math">\(Z_i = \hat{P_{1i}} - \hat{P_{0i}}\)</span></p>
<ul>
<li>Use regression tree to explain <span class="math">\(Z\)</span> by covariables <span class="math">\(X\)</span>. Then subjects with predicted <span class="math">\(Z_i\)</span> greater than some threshold <span class="math">\(c\)</span> are considered to define a subgroup.</li>
<li>Use classification tree on new variable <span class="math">\(Z^{*}\)</span> defined by <span class="math">\(Z^{*}_i=1\)</span> if <span class="math">\(Z_i &gt; c\)</span> and <span class="math">\(Z^{*}_i=0\)</span> otherwise.</li>
</ul>
<p>The idea is to identify which covariable from <span class="math">\(X\)</span> described variation of <span class="math">\(Z\)</span>.</p>
<p>Related function in aVirtualTwins package : <code>vt.tree()</code>.</p>
<hr />
</div>
</div>
</div>
<div id="sepsis-dataset" class="section level1">
<h1>Sepsis dataset</h1>
<p>See <strong>Introduction</strong>.</p>
<p><em>Sepsis</em> dataset is a simulated clinical trial with two groups treatment about sepsis desease. See details. This dataset is taken from <a href="http://biopharmnet.com/wiki/Software_for_subgroup_identification_and_analysis">SIDES method</a></p>
<p><em>Sepsis</em> contains simulated data on 470 subjects with a binary outcome survival, that stores survival status for patient after 28 days of treatment, value of 1 for subjects who died after 28 days and 0 otherwise. There are 11 covariates, listed below, all of which are numerical variables.</p>
<p>Note that contrary to the original dataset used in SIDES, missing values have been imputed by random forest <code>randomForest::rfImpute()</code>. See file <em>data-raw/sepsis.R</em> for more details.</p>
<p>True subgroup is <code>PRAPACHE &lt;= 26 &amp; AGE &lt;= 49.80</code>. <strong>NOTE:</strong> This subgroup is defined with the <em>lower</em> event rate (survival = 1) in treatement arm.</p>
<p>470 patients and 13 variables:</p>
<ul>
<li><code>survival</code> : binary outcome</li>
<li><code>THERAPY</code> : 1 for active treatment, 0 for control treatment</li>
<li><code>TIMFIRST</code> : Time from first sepsis-organ fail to start drug</li>
<li><code>AGE</code> : Patient age in years</li>
<li><code>BLLPLAT</code> : Baseline local platelets</li>
<li><code>blSOFA</code> : Sum of baselin sofa (cardiovascular, hematology, hepaticrenal, and respiration scores)</li>
<li><code>BLLCREAT</code> : Base creatinine</li>
<li><code>ORGANNUM</code> : Number of baseline organ failures</li>
<li><code>PRAPACHE</code> : Pre-infusion apache-ii score</li>
<li><code>BLGCS</code> : Base GLASGOW coma scale score</li>
<li><code>BLIL6</code> : Baseline serum IL-6 concentration</li>
<li><code>BLADL</code> : Baseline activity of daily living score</li>
<li><code>BLLBILI</code> : Baseline local bilirubin</li>
</ul>
<p><strong>Source:</strong> <a href="http://biopharmnet.com/wiki/Software_for_subgroup_identification_and_analysis" class="uri">http://biopharmnet.com/wiki/Software_for_subgroup_identification_and_analysis</a></p>
<hr />
</div>
<div id="create-object-virtualtwins" class="section level1">
<h1>Create object VirtualTwins</h1>
<p>In order to begin the two steps of <em>VT</em> method, aVirtualTwins package needs to be initialized with <code>vt.data()</code> function. type <code>?vt.data</code> for more details.</p>
<p><strong>NOTE:</strong> if running VT with interactions between <span class="math">\(T\)</span> and <span class="math">\(X\)</span>, set <code>interactions = TRUE</code>.</p>
<p>Code of <code>vt.data()</code> :</p>
<pre class="sourceCode r"><code class="sourceCode r">vt.data &lt;-<span class="st"> </span>function(dataset, outcome.field, treatment.field, <span class="dt">interactions =</span> <span class="ot">TRUE</span>, ...){
  data &lt;-<span class="st"> </span><span class="kw">formatRCTDataset</span>(dataset, outcome.field, treatment.field, <span class="dt">interactions =</span> <span class="ot">TRUE</span>)
  <span class="kw">VT.object</span>(<span class="dt">data =</span> data, ...)
}</code></pre>
<p><strong>Example with Sepsis</strong></p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># load library VT</span>
<span class="kw">library</span>(aVirtualTwins)
<span class="co"># load data sepsis</span>
<span class="kw">data</span>(sepsis)
<span class="co"># initialize VT.object</span>
vt.o &lt;-<span class="st"> </span><span class="kw">vt.data</span>(sepsis, <span class="st">&quot;survival&quot;</span>, <span class="st">&quot;THERAPY&quot;</span>, <span class="ot">TRUE</span>)</code></pre>
<pre><code>## &quot;1&quot; will be the favorable outcome</code></pre>
<p>1 will be the favorable outcome because 1 is the second level of <code>&quot;survival&quot;</code> column. It means that <span class="math">\(P(Y=1)\)</span> is the probability of interest. Anyway, it’s still possible to compute <span class="math">\(P(Y=0)\)</span>.</p>
<p><strong>Quick example</strong></p>
<p><em>Sepsis</em> does not have any categorical variable, following example show how <code>vt.data</code> deals with categorical values depending on <code>interactions</code> parameter</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Creation of categorical variable</span>
cat.x &lt;-<span class="st"> </span><span class="kw">rep</span>(<span class="dv">1</span>:<span class="dv">5</span>, (<span class="kw">nrow</span>(sepsis))/<span class="dv">5</span>)
cat.x &lt;-<span class="st"> </span><span class="kw">as.factor</span>(cat.x)
sepsis.tmp &lt;-<span class="st"> </span><span class="kw">cbind</span>(sepsis, cat.x)
vt.o.tmp &lt;-<span class="st"> </span><span class="kw">vt.data</span>(sepsis.tmp, <span class="st">&quot;survival&quot;</span>, <span class="st">&quot;THERAPY&quot;</span>, <span class="ot">TRUE</span>)</code></pre>
<pre><code>## &quot;1&quot; will be the favorable outcome 
## Creation of dummy variables for cat.x 
## Dummy variable cat.x_1 created 
## Dummy variable cat.x_2 created 
## Dummy variable cat.x_3 created 
## Dummy variable cat.x_4 created 
## Dummy variable cat.x_5 created</code></pre>
<p>Dummies variables are created for each category of <code>cat.x</code> variable. And <code>cat.x</code> is removed from dataset.</p>
<hr />
</div>
<div id="step-1-compute-hatp_1i-and-hatp_0i" class="section level1">
<h1>Step 1 : compute <span class="math">\(\hat{P_{1i}}\)</span> and <span class="math">\(\hat{P_{0i}}\)</span></h1>
<p>As described earlier, step 1 can be done via differents ways</p>
<div id="simple-random-forest" class="section level2">
<h2>Simple Random Forest</h2>
<p>Following example used <em>sepsis</em> data created in previous part.</p>
<p>To perform simple random forest on <code>VT.object</code>, <code>randomForest</code>, <code>caret</code> and <code>party</code> package can be used.</p>
<p>Class <code>vt.forest(&quot;one&quot;, ...)</code> is used. It takes in arguments :</p>
<ul>
<li><code>forest.type</code> : you have to set it to <code>&quot;one&quot;</code></li>
<li><code>vt.data</code> : return of <code>vt.data()</code> function</li>
<li><code>model</code> : a random forest model</li>
<li><code>interactions</code> : logical, <code>TRUE</code> is default value</li>
<li><code>...</code> : options to <code>randomForest()</code> function</li>
</ul>
<p><strong>with <code>randomForest</code></strong></p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># use randomForest::randomForest()</span>
<span class="kw">library</span>(randomForest, <span class="dt">verbose =</span> F)
<span class="co"># Reproducibility</span>
<span class="kw">set.seed</span>(<span class="dv">123</span>)
<span class="co"># Fit rf model </span>
<span class="co"># default params</span>
<span class="co"># set interactions to TRUE if using interaction between T and X</span>
model.rf &lt;-<span class="st"> </span><span class="kw">randomForest</span>(<span class="dt">x =</span> vt.o$<span class="kw">getX</span>(<span class="dt">interactions =</span> T),
                         <span class="dt">y =</span> vt.o$<span class="kw">getY</span>(),
                         <span class="dt">ntree =</span> <span class="dv">500</span>)
<span class="co"># initialize VT.forest.one</span>
vt.f.rf &lt;-<span class="st"> </span><span class="kw">vt.forest</span>(<span class="st">&quot;one&quot;</span>, <span class="dt">vt.data =</span> vt.o, <span class="dt">model =</span> model.rf, <span class="dt">interactions =</span> T)
### or you can use randomForest inside vt.forest()
vt.f.rf &lt;-<span class="st"> </span><span class="kw">vt.forest</span>(<span class="st">&quot;one&quot;</span>, <span class="dt">vt.data =</span> vt.o, <span class="dt">interactions =</span> T, <span class="dt">ntree =</span> <span class="dv">500</span>)</code></pre>
<p><strong>with <code>party</code></strong></p>
<p><code>cforest()</code> can be usefull however computing time is really long. I think there is an issue when giving <em>cforest object</em> in Reference Class parameter. Need to fix it.</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># # use randomForest::randomForest()</span>
<span class="co"># library(party, verbose = F)</span>
<span class="co"># # Reproducibility</span>
<span class="co"># set.seed(123)</span>
<span class="co"># # Fit cforest model </span>
<span class="co"># # default params</span>
<span class="co"># # set interactions to TRUE if using interaction between T and X</span>
<span class="co"># model.cf &lt;- cforest(formula = vt.o$getFormula(), data = vt.o$getData(interactions = T))</span>
<span class="co"># # initialize VT.forest.one</span>
<span class="co"># vt.f.cf &lt;- vt.forest(&quot;one&quot;, vt.data = vt.o, model = model.cf)</span></code></pre>
<p><strong>with <code>caret</code></strong></p>
<p>Using <code>caret</code> can be usefull to deal with parallel computing for example.</p>
<p><strong>NOTE:</strong> For <code>caret</code> levels of outcome can’t be 0, so i’ll change levels name into “n”/“y”</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Copy new object</span>
vt.o.tr &lt;-<span class="st"> </span>vt.o$<span class="kw">copy</span>()
<span class="co"># Change levels</span>
tmp &lt;-<span class="st"> </span><span class="kw">ifelse</span>(vt.o.tr$data$survival ==<span class="st"> </span><span class="dv">1</span>, <span class="st">&quot;y&quot;</span>, <span class="st">&quot;n&quot;</span>)
vt.o.tr$data$survival &lt;-<span class="st"> </span><span class="kw">as.factor</span>(tmp)
<span class="kw">rm</span>(tmp)
<span class="co"># Check new data to be sure</span>
<span class="kw">formatRCTDataset</span>(vt.o.tr$data, <span class="st">&quot;survival&quot;</span>, <span class="st">&quot;THERAPY&quot;</span>)</code></pre>
<pre><code>## &quot;y&quot; will be the favorable outcome</code></pre>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># use caret::train()</span>
<span class="kw">library</span>(caret, <span class="dt">verbose =</span> F)
<span class="co"># Reproducibility</span>
<span class="kw">set.seed</span>(<span class="dv">123</span>)
<span class="co"># fit train model</span>
fitControl &lt;-<span class="st"> </span><span class="kw">trainControl</span>(<span class="dt">classProbs =</span> T, <span class="dt">method =</span> <span class="st">&quot;none&quot;</span>)
model.tr &lt;-<span class="st"> </span><span class="kw">train</span>(<span class="dt">x =</span> vt.o.tr$<span class="kw">getX</span>(<span class="dt">interactions =</span> T),
                  <span class="dt">y =</span> vt.o.tr$<span class="kw">getY</span>(),
                  <span class="dt">method =</span> <span class="st">&quot;rf&quot;</span>,
                  <span class="dt">tuneGrid =</span> <span class="kw">data.frame</span>(<span class="dt">mtry =</span> <span class="dv">5</span>),
                  <span class="dt">trControl =</span> fitControl)
<span class="co"># initialize VT.forest.one</span>
vt.f.tr &lt;-<span class="st"> </span><span class="kw">vt.forest</span>(<span class="st">&quot;one&quot;</span>, vt.o.tr, <span class="dt">model =</span> model.tr)</code></pre>
</div>
<div id="double-random-forest" class="section level2">
<h2>Double Random Forest</h2>
<p>To perform double random forest on <code>VT.object</code>, same packages as simple random forest can be used.</p>
<p>Function <code>vt.forest(&quot;double&quot;, ...)</code> is used. It takes in arguments :</p>
<ul>
<li><code>forest.type</code> : You have to set is to <code>&quot;double&quot;</code></li>
<li><code>vt.data</code> : return of <code>vt.data()</code> function</li>
<li><code>model_trt1</code> : a random forest model for <span class="math">\(T=1\)</span> (this argument has to be specified)</li>
<li><code>model_trt0</code> : a random forest model for <span class="math">\(T=0\)</span> (this argument has to be specified)</li>
</ul>
<p><strong>NOTE:</strong> use <code>trt</code> parameter in <code>VT.object::getX()</code> or <code>VT.object::getY()</code> methods to obtain part of data depending on treatment. See following example.</p>
<p><strong>with <code>randomForest</code></strong></p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># grow RF for T = 1</span>
model.rf.trt1 &lt;-<span class="st"> </span><span class="kw">randomForest</span>(<span class="dt">x =</span> vt.o$<span class="kw">getX</span>(<span class="dt">trt =</span> <span class="dv">1</span>),
                              <span class="dt">y =</span> vt.o$<span class="kw">getY</span>(<span class="dt">trt =</span> <span class="dv">1</span>))
<span class="co"># grow RF for T = 0</span>
model.rf.trt0 &lt;-<span class="st"> </span><span class="kw">randomForest</span>(<span class="dt">x =</span> vt.o$<span class="kw">getX</span>(<span class="dt">trt =</span> <span class="dv">0</span>),
                              <span class="dt">y =</span> vt.o$<span class="kw">getY</span>(<span class="dt">trt =</span> <span class="dv">0</span>))
<span class="co"># initialize VT.forest.double()</span>
vt.doublef.rf &lt;-<span class="st"> </span><span class="kw">vt.forest</span>(<span class="st">&quot;double&quot;</span>,
                           <span class="dt">vt.data =</span> vt.o, 
                           <span class="dt">model_trt1 =</span> model.rf.trt1, 
                           <span class="dt">model_trt0 =</span> model.rf.trt0)
### Or you can use randomForest() inside
vt.doublef.rf &lt;-<span class="st"> </span><span class="kw">vt.forest</span>(<span class="st">&quot;double&quot;</span>,
                           <span class="dt">vt.data =</span> vt.o,
                           <span class="dt">ntree =</span> <span class="dv">200</span>)</code></pre>
<p>Follow the same structure for <code>caret</code> or <code>cforest</code> models.</p>
</div>
<div id="k-fold-random-forest" class="section level2">
<h2>K Fold Random Forest</h2>
<p>This idea is taken from <em>method 3</em> of Jared Foster paper :</p>
<blockquote>
<p>A modification of [previous methods] is to obtain <span class="math">\(\hat{P_{1i}}\)</span> and <span class="math">\(\hat{P_{0i}}\)</span> via cross-validation. In this méthod the specific data for subject <span class="math">\(i\)</span> is not used to obtain <span class="math">\(\hat{P_{1i}}\)</span> and <span class="math">\(\hat{P_{0i}}\)</span>. Using k-fold cross-validation, we apply random forest regression approach to <span class="math">\(\frac{k-1}{k}\)</span> of the data and use the resulting predictor to obtain estimates of <span class="math">\(P_{1i}\)</span> and <span class="math">\(P_{0i}\)</span> for the remaining <span class="math">\(\frac{1}{k}\)</span> of the observations. This is repeated <span class="math">\(k\)</span> times.</p>
</blockquote>
<p>To use this approach, use <code>vt.forest(&quot;fold&quot;, ...)</code>. This class takes in argument :</p>
<ul>
<li><code>forest.type</code> : it has to be set to <code>&quot;fold&quot;</code></li>
<li><code>vt.data</code> : return of <code>vt.data()</code> function</li>
<li><code>fold</code> : number of fold (e.g. <span class="math">\(5\)</span>)</li>
<li><code>ratio</code> : Control of sampsize balance. <code>ratio</code> of <span class="math">\(2\)</span> means that there 2 times le highest level compared to the other. “Highest” means the level with larger observations. It’s in test.</li>
<li><code>interactions</code> : Logical. If <code>TRUE</code>, interactions between covariables and treatments are used. <code>FALSE</code> otherwise.</li>
<li><code>...</code> : <code>randomForest()</code> function options</li>
</ul>
<p><strong>NOTE:</strong> This function use only <code>randomForest</code> package.</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># initialize k-fold RF</span>
<span class="co"># you can use randomForest options</span>
model.fold &lt;-<span class="st"> </span><span class="kw">vt.forest</span>(<span class="st">&quot;fold&quot;</span>, <span class="dt">vt.data =</span> vt.o, <span class="dt">fold =</span> <span class="dv">5</span>, <span class="dt">ratio =</span> <span class="dv">1</span>, <span class="dt">interactions =</span> T, <span class="dt">ntree =</span> <span class="dv">200</span>)</code></pre>
</div>
<div id="build-your-own-model" class="section level2">
<h2>Build Your Own Model</h2>
<p>Random Forests are not the only models you can use to compute <span class="math">\(\hat{P_{1i}}\)</span> and <span class="math">\(\hat{P_{0i}}\)</span>. Any prediction model can be used, as logitic regression, boosting …</p>
<p>Anyway, aVirtualTwins package can be used. To do so, you can use <code>VT.difft()</code> class. It is important to note this the parent class of all “forests” classes. It takes in argument :</p>
<ul>
<li><code>vt.object</code> : return of <code>vt.data()</code> function</li>
<li><code>twin1</code> : estimate of <span class="math">\(P(Y_{i} = 1 | T = T_{i})\)</span> : meaning response probability under the correct treatment.</li>
<li><code>twin1</code> : estimate of <span class="math">\(P(Y_{i} = 1 | T = 1-T_{i})\)</span> : meaning response probability under the other treatment.</li>
<li><code>method</code> : <em>absolute</em> (default), <em>relative</em> or <em>logit</em>. See <code>?VT.difft</code> for details.</li>
</ul>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># you get twin1 and twin2 by your own method</span>
<span class="co"># here, i'll use random number between 0 and 1 :</span>
twin1_random &lt;-<span class="st"> </span><span class="kw">runif</span>(<span class="dv">470</span>)
twin2_random &lt;-<span class="st"> </span><span class="kw">runif</span>(<span class="dv">470</span>)

<span class="co"># then you can initialize VT.difft class : </span>
model.difft &lt;-<span class="st"> </span><span class="kw">VT.difft</span>(vt.o, <span class="dt">twin1 =</span> twin1_random, <span class="dt">twin2 =</span> twin2_random, <span class="st">&quot;absolute&quot;</span>)
<span class="co"># compute difference of twins : </span>
model.difft$<span class="kw">computeDifft</span>()
<span class="co"># See results</span>
<span class="kw">head</span>(model.difft$difft)</code></pre>
<pre><code>## [1] -0.03599908 -0.44271883 -0.25458624 -0.64201822  0.29347148 -0.02843780</code></pre>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Graph :</span>
<span class="co"># hist(model.difft$difft)</span></code></pre>
<p><strong>NOTE: Also, you can clone repository, write your own child class of <code>VT.difft()</code> AND submit it !</strong></p>
<hr />
</div>
</div>
<div id="step-2-estimate-a-regression-or-classification-tree" class="section level1">
<h1>Step 2 : Estimate a Regression or Classification Tree</h1>
<p>As described in the method, we define <span class="math">\(Z_i = \hat{P_{1i}} - \hat{P_{0i}}\)</span>. It’s the difference in term of response of the active treatments compared to the control treatment. The idea is to try to explain this difference by few covariables.</p>
<div id="classification" class="section level2">
<h2>Classification</h2>
<p>We define a new variable <span class="math">\(Z^{*}\)</span>, <span class="math">\(Z^{*}_i=1\)</span> if <span class="math">\(Z_i &gt; c\)</span> and <span class="math">\(Z^{*}_i=0\)</span> otherwise. Classification tree’s goal is to explain the value <span class="math">\(Z^*=1\)</span>. <span class="math">\(c\)</span> is a threshold given by the user. It’s the threshold for which the difference is “interesting”. One idea is to use quantiles of the <em>difft</em> distribution.</p>
<p>To compute a classifiction tree, <code>vt.tree(&quot;class&quot;, ...)</code> is used. Internally, <code>rpart::rpart()</code> is computed. It takes in argument:</p>
<ul>
<li><code>tree.type</code> : it has to be set to <code>&quot;class&quot;</code></li>
<li><code>vt.difft</code> : <code>VT.difft</code> object (return of <code>vt.forest()</code> function)</li>
<li><code>sens</code> : <code>c(&quot;&gt;&quot;, &quot;&lt;&quot;)</code>. <code>sens</code> corresponds to the way <span class="math">\(Z^{*}\)</span> is defined.
<ul>
<li><code>&quot;&gt;&quot;</code> (default) : <span class="math">\(Z^{*}\)</span>, <span class="math">\(Z^{*}_i=1\)</span> if <span class="math">\(Z_i &gt; c\)</span> and <span class="math">\(Z^{*}_i=0\)</span> otherwise.</li>
<li><code>&quot;&lt;&quot;</code> : <span class="math">\(Z^{*}\)</span>, <span class="math">\(Z^{*}_i=1\)</span> if <span class="math">\(Z_i &lt; c\)</span> and <span class="math">\(Z^{*}_i=0\)</span> otherwise.<br /></li>
</ul></li>
<li><code>threshold</code> : corresponds to <span class="math">\(c\)</span>, it can be a vector. <span class="math">\(seq(.5, .8, .1)\)</span> by default.</li>
<li><code>screening</code> : <code>NULL</code> is default value. If <code>TRUE</code> only covariables in <code>varimp</code> <code>vt.data</code> ’s field is used.</li>
</ul>
<p>See <code>?VT.tree</code> for details.</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># initialize classification tree</span>
tr.class &lt;-<span class="st"> </span><span class="kw">vt.tree</span>(<span class="st">&quot;class&quot;</span>,
                    <span class="dt">vt.difft =</span> vt.f.rf,
                    <span class="dt">sens =</span> <span class="st">&quot;&gt;&quot;</span>,
                    <span class="dt">threshold =</span> <span class="kw">quantile</span>(vt.f.rf$difft, <span class="kw">seq</span>(.<span class="dv">5</span>, .<span class="dv">8</span>, .<span class="dv">1</span>)))
<span class="co"># tr.class is a list if threshold is a vectoor</span>
<span class="kw">class</span>(tr.class)</code></pre>
<pre><code>## [1] &quot;list&quot;</code></pre>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># acce trees with treeXX</span>
<span class="kw">class</span>(tr.class$tree1)</code></pre>
<pre><code>## [1] &quot;VT.tree.class&quot;
## attr(,&quot;package&quot;)
## [1] &quot;aVirtualTwins&quot;</code></pre>
</div>
<div id="regression" class="section level2">
<h2>Regression</h2>
<p>Use regression tree to explain <span class="math">\(Z\)</span> by covariables <span class="math">\(X\)</span>. Then some leafs have predicted <span class="math">\(Z_i\)</span> greater than the threshold <span class="math">\(c\)</span> (if <span class="math">\(sens\)</span> is “&gt;”), and it defines which covariables explain <span class="math">\(Z\)</span>.</p>
<p>The function to use is <code>vt.tree(&quot;reg&quot;, ...)</code>. It takes same parameters than classification mehod.</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># initialize regression tree</span>
tr.reg &lt;-<span class="st"> </span><span class="kw">vt.tree</span>(<span class="st">&quot;reg&quot;</span>,
                  <span class="dt">vt.difft =</span> vt.f.rf,
                  <span class="dt">sens =</span> <span class="st">&quot;&gt;&quot;</span>,
                  <span class="dt">threshold =</span> <span class="kw">quantile</span>(vt.f.rf$difft, <span class="kw">seq</span>(.<span class="dv">5</span>, .<span class="dv">8</span>, .<span class="dv">1</span>)))
<span class="co"># tr.class is a list if threshold is a vectoor</span>
<span class="kw">class</span>(tr.reg)</code></pre>
<pre><code>## [1] &quot;list&quot;</code></pre>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># acce trees with treeXX</span>
<span class="kw">class</span>(tr.reg$tree1)</code></pre>
<pre><code>## [1] &quot;VT.tree.reg&quot;
## attr(,&quot;package&quot;)
## [1] &quot;aVirtualTwins&quot;</code></pre>
</div>
<div id="subgroups-and-results" class="section level2">
<h2>Subgroups and results</h2>
</div>
</div>


<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>

</body>
</html>