Convert a pdf into a conditional pdf such that mean increases and std dev falls Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsDo I have to standardize my new polynomial features?Understanding Tensorflow LSTM models?CNN tagging such that each input could have multiple tagsWhat is the best way to normalize histogram vectors to get distribution?Using ML to create unique descriptors?Python: Detect if data of a time series stays constant, increases or decreasescorrecting conditional and marginal distribution in transfer learningMaths question on mean squared error being dervied to bias and varianceHow to convert an array of numbers into probability values?Layman's description of PDF and CDF

Does a random sequence of vectors span a Hilbert space?

Why are current probes so expensive?

Can two people see the same photon?

Twin's vs. Twins'

Did John Wesley plagiarize Matthew Henry...?

How do Java 8 default methods hеlp with lambdas?

Do i imagine the linear (straight line) homotopy in a correct way?

An isoperimetric-type inequality inside a cube

Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?

What does 丫 mean? 丫是什么意思？

Did any compiler fully use 80-bit floating point?

How could a hydrazine and N2O4 cloud (or it's reactants) show up in weather radar?

Any stored/leased 737s that could substitute for grounded MAXs?

Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?

How can I prevent/balance waiting and turtling as a response to cooldown mechanics

How do I say "this must not happen"?

Diophantine equation 3^a+1=3^b+5^c

How does TikZ render an arc?

Is the Mordenkainen's Sword spell underpowered?

Pointing to problems without suggesting solutions

How do you write "wild blueberries flavored"?

Besides transaction validation, are there any other uses of the Script language in Bitcoin

Vertical ranges of Column Plots in 12

Sally's older brother

Convert a pdf into a conditional pdf such that mean increases and std dev falls

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsDo I have to standardize my new polynomial features?Understanding Tensorflow LSTM models?CNN tagging such that each input could have multiple tagsWhat is the best way to normalize histogram vectors to get distribution?Using ML to create unique descriptors?Python: Detect if data of a time series stays constant, increases or decreasescorrecting conditional and marginal distribution in transfer learningMaths question on mean squared error being dervied to bias and varianceHow to convert an array of numbers into probability values?Layman's description of PDF and CDF

Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.

I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.

P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).

Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):

enter image description here

You can see that P(S|X) changes for different values of X as given in the below plot:
enter image description here

For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.

What I am interested in is given S and bunch of Xs... pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation... Please help me find a standard technique that would help me achieve this.

Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.

edited Feb 9 '18 at 6:38

asked Feb 9 '18 at 6:09

claudius

687

$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12

$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25

add a comment |

I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.

P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).

Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):

enter image description here

You can see that P(S|X) changes for different values of X as given in the below plot:
enter image description here

For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.

Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.

edited Feb 9 '18 at 6:38

asked Feb 9 '18 at 6:09

claudius

687

$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12

$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25

add a comment |

I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.

P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).

Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):

enter image description here

You can see that P(S|X) changes for different values of X as given in the below plot:
enter image description here

For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.

Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.

edited Feb 9 '18 at 6:38

asked Feb 9 '18 at 6:09

claudius

687

I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.

P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).

Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):

enter image description here

You can see that P(S|X) changes for different values of X as given in the below plot:
enter image description here

For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.

Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.

machine-learning probability variance discriminant-analysis

edited Feb 9 '18 at 6:38

asked Feb 9 '18 at 6:09

claudius

687

edited Feb 9 '18 at 6:38

asked Feb 9 '18 at 6:09

claudius

687

edited Feb 9 '18 at 6:38

asked Feb 9 '18 at 6:09

claudius

687

asked Feb 9 '18 at 6:09

claudius

687

asked Feb 9 '18 at 6:09

claudius

687

$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12

$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25

add a comment |

$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12

$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25

are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?

– David Marx
Feb 9 '18 at 7:12

the latter.... basically i want to condition on X such that distribution changes in a certain sense

– claudius
Feb 9 '18 at 9:25

add a comment |

1 Answer
1

active

oldest

votes

Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:

set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5) 
 if (par[1]>par[2]) par = rev(par)
 ix <- which((par[1] < x) & (x < par[2]))
 k <- length(ix)
 if (k==0) return(1e12)
 mu_yx <- mean(y[ix])
 sd_yx <- sd(y[ix])

 alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y) 
# TRUE

sd(y) > sd(y[ix])
# TRUE

answered Feb 9 '18 at 11:47

David Marx

2,212412

$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f27632%2fconvert-a-pdf-into-a-conditional-pdf-such-that-mean-increases-and-std-dev-falls%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:

set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5) 
 if (par[1]>par[2]) par = rev(par)
 ix <- which((par[1] < x) & (x < par[2]))
 k <- length(ix)
 if (k==0) return(1e12)
 mu_yx <- mean(y[ix])
 sd_yx <- sd(y[ix])

 alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y) 
# TRUE

sd(y) > sd(y[ix])
# TRUE

answered Feb 9 '18 at 11:47

David Marx

2,212412

$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31

add a comment |

Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:

set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5) 
 if (par[1]>par[2]) par = rev(par)
 ix <- which((par[1] < x) & (x < par[2]))
 k <- length(ix)
 if (k==0) return(1e12)
 mu_yx <- mean(y[ix])
 sd_yx <- sd(y[ix])

 alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y) 
# TRUE

sd(y) > sd(y[ix])
# TRUE

answered Feb 9 '18 at 11:47

David Marx

2,212412

$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31

add a comment |

Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:

set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5) 
 if (par[1]>par[2]) par = rev(par)
 ix <- which((par[1] < x) & (x < par[2]))
 k <- length(ix)
 if (k==0) return(1e12)
 mu_yx <- mean(y[ix])
 sd_yx <- sd(y[ix])

 alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y) 
# TRUE

sd(y) > sd(y[ix])
# TRUE

answered Feb 9 '18 at 11:47

David Marx

2,212412

Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:

set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2

x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2

plot(x, y)

# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)

mu_y_emp = mean(y)
sd_y_emp = sd(y)

objective <- function(par, alpha=0.5) 
 if (par[1]>par[2]) par = rev(par)
 ix <- which((par[1] < x) & (x < par[2]))
 k <- length(ix)
 if (k==0) return(1e12)
 mu_yx <- mean(y[ix])
 sd_yx <- sd(y[ix])

 alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)


init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)

ix <- which((par[1] < x) & (x < par[2]))

mean(y[ix]) > mean(y) 
# TRUE

sd(y) > sd(y[ix])
# TRUE

answered Feb 9 '18 at 11:47

David Marx

2,212412

answered Feb 9 '18 at 11:47

David Marx

2,212412

answered Feb 9 '18 at 11:47

David Marx

2,212412

answered Feb 9 '18 at 11:47

David Marx

2,212412

$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31

add a comment |

$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31

Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??

– claudius
Feb 12 '18 at 6:31

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1