Convert a pdf into a conditional pdf such that mean increases and std dev falls Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsDo I have to standardize my new polynomial features?Understanding Tensorflow LSTM models?CNN tagging such that each input could have multiple tagsWhat is the best way to normalize histogram vectors to get distribution?Using ML to create unique descriptors?Python: Detect if data of a time series stays constant, increases or decreasescorrecting conditional and marginal distribution in transfer learningMaths question on mean squared error being dervied to bias and varianceHow to convert an array of numbers into probability values?Layman's description of PDF and CDF

Multi tool use
Does a random sequence of vectors span a Hilbert space?
Why are current probes so expensive?
Can two people see the same photon?
Twin's vs. Twins'
Did John Wesley plagiarize Matthew Henry...?
How do Java 8 default methods hеlp with lambdas?
Do i imagine the linear (straight line) homotopy in a correct way?
An isoperimetric-type inequality inside a cube
Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?
What does 丫 mean? 丫是什么意思?
Did any compiler fully use 80-bit floating point?
How could a hydrazine and N2O4 cloud (or it's reactants) show up in weather radar?
Any stored/leased 737s that could substitute for grounded MAXs?
Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?
How can I prevent/balance waiting and turtling as a response to cooldown mechanics
How do I say "this must not happen"?
Diophantine equation 3^a+1=3^b+5^c
How does TikZ render an arc?
Is the Mordenkainen's Sword spell underpowered?
Pointing to problems without suggesting solutions
How do you write "wild blueberries flavored"?
Besides transaction validation, are there any other uses of the Script language in Bitcoin
Vertical ranges of Column Plots in 12
Sally's older brother
Convert a pdf into a conditional pdf such that mean increases and std dev falls
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsDo I have to standardize my new polynomial features?Understanding Tensorflow LSTM models?CNN tagging such that each input could have multiple tagsWhat is the best way to normalize histogram vectors to get distribution?Using ML to create unique descriptors?Python: Detect if data of a time series stays constant, increases or decreasescorrecting conditional and marginal distribution in transfer learningMaths question on mean squared error being dervied to bias and varianceHow to convert an array of numbers into probability values?Layman's description of PDF and CDF
$begingroup$
Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.
I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.
P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).
Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):
You can see that P(S|X) changes for different values of X as given in the below plot:
For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.
What I am interested in is given S and bunch of Xs... pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation... Please help me find a standard technique that would help me achieve this.
Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.
machine-learning probability variance discriminant-analysis
$endgroup$
add a comment |
$begingroup$
Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.
I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.
P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).
Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):
You can see that P(S|X) changes for different values of X as given in the below plot:
For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.
What I am interested in is given S and bunch of Xs... pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation... Please help me find a standard technique that would help me achieve this.
Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.
machine-learning probability variance discriminant-analysis
$endgroup$
$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12
$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25
add a comment |
$begingroup$
Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.
I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.
P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).
Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):
You can see that P(S|X) changes for different values of X as given in the below plot:
For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.
What I am interested in is given S and bunch of Xs... pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation... Please help me find a standard technique that would help me achieve this.
Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.
machine-learning probability variance discriminant-analysis
$endgroup$
Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.
I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables.
P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).
Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):
You can see that P(S|X) changes for different values of X as given in the below plot:
For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.
What I am interested in is given S and bunch of Xs... pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation... Please help me find a standard technique that would help me achieve this.
Also I don't want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.
machine-learning probability variance discriminant-analysis
machine-learning probability variance discriminant-analysis
edited Feb 9 '18 at 6:38
claudius
asked Feb 9 '18 at 6:09
claudiusclaudius
687
687
$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12
$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25
add a comment |
$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12
$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25
$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12
$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12
$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25
$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:
set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2
x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2
plot(x, y)
# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)
mu_y_emp = mean(y)
sd_y_emp = sd(y)
objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])
alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)
init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)
ix <- which((par[1] < x) & (x < par[2]))
mean(y[ix]) > mean(y)
# TRUE
sd(y) > sd(y[ix])
# TRUE
$endgroup$
$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f27632%2fconvert-a-pdf-into-a-conditional-pdf-such-that-mean-increases-and-std-dev-falls%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:
set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2
x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2
plot(x, y)
# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)
mu_y_emp = mean(y)
sd_y_emp = sd(y)
objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])
alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)
init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)
ix <- which((par[1] < x) & (x < par[2]))
mean(y[ix]) > mean(y)
# TRUE
sd(y) > sd(y[ix])
# TRUE
$endgroup$
$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31
add a comment |
$begingroup$
Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:
set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2
x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2
plot(x, y)
# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)
mu_y_emp = mean(y)
sd_y_emp = sd(y)
objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])
alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)
init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)
ix <- which((par[1] < x) & (x < par[2]))
mean(y[ix]) > mean(y)
# TRUE
sd(y) > sd(y[ix])
# TRUE
$endgroup$
$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31
add a comment |
$begingroup$
Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:
set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2
x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2
plot(x, y)
# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)
mu_y_emp = mean(y)
sd_y_emp = sd(y)
objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])
alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)
init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)
ix <- which((par[1] < x) & (x < par[2]))
mean(y[ix]) > mean(y)
# TRUE
sd(y) > sd(y[ix])
# TRUE
$endgroup$
Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:
set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2
x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2
plot(x, y)
# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)
mu_y_emp = mean(y)
sd_y_emp = sd(y)
objective <- function(par, alpha=0.5)
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])
alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)
init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)
ix <- which((par[1] < x) & (x < par[2]))
mean(y[ix]) > mean(y)
# TRUE
sd(y) > sd(y[ix])
# TRUE
answered Feb 9 '18 at 11:47


David MarxDavid Marx
2,212412
2,212412
$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31
add a comment |
$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31
$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31
$begingroup$
Thx for the reply david! I did some basic reading and have learnt that discriminative analysis can be used to solve this problem as well... do u have any ideas in that direction??
$endgroup$
– claudius
Feb 12 '18 at 6:31
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f27632%2fconvert-a-pdf-into-a-conditional-pdf-such-that-mean-increases-and-std-dev-falls%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1nFJjXAb,fjv7J1TaBKS49 xb,nocwkYt tiqN,6vEdmqkAJ JsmlzH5nCDS N 4 vwgRD EkKv nRV4jwDolfKNE
$begingroup$
are you asking how to construct a particular set of distributions, or are you asking how to identify a subset of your data that has a particular property relative to the rest of your data?
$endgroup$
– David Marx
Feb 9 '18 at 7:12
$begingroup$
the latter.... basically i want to condition on X such that distribution changes in a certain sense
$endgroup$
– claudius
Feb 9 '18 at 9:25