Convert a pdf into a conditional pdf such that mean increases and std dev falls Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsDo I have to standardize my new polynomial features?Understanding Tensorflow LSTM models?CNN tagging such that each input could have multiple tagsWhat is the best way to normalize histogram vectors to get distribution?Using ML to create unique descriptors?Python: Detect if data of a time series stays constant, increases or decreasescorrecting conditional and marginal distribution in transfer learningMaths question on mean squared error being dervied to bias and varianceHow to convert an array of numbers into probability values?Layman's description of PDF and CDF

Does a random sequence of vectors span a Hilbert space?

Why are current probes so expensive?

Can two people see the same photon?

Twin's vs. Twins'

Did John Wesley plagiarize Matthew Henry...?

How do Java 8 default methods hеlp with lambdas?

Do i imagine the linear (straight line) homotopy in a correct way?

An isoperimetric-type inequality inside a cube

Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?

What does 丫 mean? 丫是什么意思?

Did any compiler fully use 80-bit floating point?

How could a hydrazine and N2O4 cloud (or it's reactants) show up in weather radar?

Any stored/leased 737s that could substitute for grounded MAXs?

Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?

How can I prevent/balance waiting and turtling as a response to cooldown mechanics

How do I say "this must not happen"?

Diophantine equation 3^a+1=3^b+5^c

How does TikZ render an arc?

Is the Mordenkainen's Sword spell underpowered?

Pointing to problems without suggesting solutions

How do you write "wild blueberries flavored"?

Besides transaction validation, are there any other uses of the Script language in Bitcoin

Vertical ranges of Column Plots in 12

Sally's older brother



Convert a pdf into a conditional pdf such that mean increases and std dev falls



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsDo I have to standardize my new polynomial features?Understanding Tensorflow LSTM models?CNN tagging such that each input could have multiple tagsWhat is the best way to normalize histogram vectors to get distribution?Using ML to create unique descriptors?Python: Detect if data of a time series stays constant, increases or decreasescorrecting conditional and marginal distribution in transfer learningMaths question on mean squared error being dervied to bias and varianceHow to convert an array of numbers into probability values?Layman's description of PDF and CDF










1












$begingroup$


Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.



I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.



P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).



Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):



enter image description here



You can see that P(S|X) changes for different values of X as given in the below plot:
enter image description here



For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.



What I am interested in is given S and bunch of Xs... pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation... Please help me find a standard technique that would help me achieve this.



Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.










share|improve this question











$endgroup$











  • $begingroup$
    are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
    $endgroup$
    – David Marx
    Feb 9 '18 at 7:12










  • $begingroup$
    the latter.... basically i want to condition on X such that distribution changes in a certain sense
    $endgroup$
    – claudius
    Feb 9 '18 at 9:25















1












$begingroup$


Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.



I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.



P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).



Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):



enter image description here



You can see that P(S|X) changes for different values of X as given in the below plot:
enter image description here



For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.



What I am interested in is given S and bunch of Xs... pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation... Please help me find a standard technique that would help me achieve this.



Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.










share|improve this question











$endgroup$











  • $begingroup$
    are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
    $endgroup$
    – David Marx
    Feb 9 '18 at 7:12










  • $begingroup$
    the latter.... basically i want to condition on X such that distribution changes in a certain sense
    $endgroup$
    – claudius
    Feb 9 '18 at 9:25













1












1








1





$begingroup$


Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.



I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.



P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).



Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):



enter image description here



You can see that P(S|X) changes for different values of X as given in the below plot:
enter image description here



For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.



What I am interested in is given S and bunch of Xs... pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation... Please help me find a standard technique that would help me achieve this.



Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.










share|improve this question











$endgroup$




Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.



I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.



P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).



Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):



enter image description here



You can see that P(S|X) changes for different values of X as given in the below plot:
enter image description here



For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.



What I am interested in is given S and bunch of Xs... pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation... Please help me find a standard technique that would help me achieve this.



Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.







machine-learning probability variance discriminant-analysis






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 9 '18 at 6:38







claudius

















asked Feb 9 '18 at 6:09









claudiusclaudius

687




687











  • $begingroup$
    are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
    $endgroup$
    – David Marx
    Feb 9 '18 at 7:12










  • $begingroup$
    the latter.... basically i want to condition on X such that distribution changes in a certain sense
    $endgroup$
    – claudius
    Feb 9 '18 at 9:25
















  • $begingroup$
    are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
    $endgroup$
    – David Marx
    Feb 9 '18 at 7:12










  • $begingroup$
    the latter.... basically i want to condition on X such that distribution changes in a certain sense
    $endgroup$
    – claudius
    Feb 9 '18 at 9:25















$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12




$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12












$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25




$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25










1 Answer
1






active

oldest

votes


















0












$begingroup$

Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:



set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])

alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y)
# TRUE

sd(y) > sd(y[ix])
# TRUE





share|improve this answer









$endgroup$












  • $begingroup$
    Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
    $endgroup$
    – claudius
    Feb 12 '18 at 6:31












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f27632%2fconvert-a-pdf-into-a-conditional-pdf-such-that-mean-increases-and-std-dev-falls%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:



set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])

alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y)
# TRUE

sd(y) > sd(y[ix])
# TRUE





share|improve this answer









$endgroup$












  • $begingroup$
    Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
    $endgroup$
    – claudius
    Feb 12 '18 at 6:31
















0












$begingroup$

Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:



set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])

alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y)
# TRUE

sd(y) > sd(y[ix])
# TRUE





share|improve this answer









$endgroup$












  • $begingroup$
    Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
    $endgroup$
    – claudius
    Feb 12 '18 at 6:31














0












0








0





$begingroup$

Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:



set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])

alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y)
# TRUE

sd(y) > sd(y[ix])
# TRUE





share|improve this answer









$endgroup$



Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:



set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])

alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y)
# TRUE

sd(y) > sd(y[ix])
# TRUE






share|improve this answer












share|improve this answer



share|improve this answer










answered Feb 9 '18 at 11:47









David MarxDavid Marx

2,212412




2,212412











  • $begingroup$
    Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
    $endgroup$
    – claudius
    Feb 12 '18 at 6:31

















  • $begingroup$
    Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
    $endgroup$
    – claudius
    Feb 12 '18 at 6:31
















$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31





$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31


















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f27632%2fconvert-a-pdf-into-a-conditional-pdf-such-that-mean-increases-and-std-dev-falls%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High