Aggregate NumPy array with condition as mask Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsPCA algorithm problems - PythonConsistently inconsistent cross-validation results that are wildly different from original model accuracyIs this the correct way to apply a recommender system based on KNN and cosine similarity to predict continuous values?Using scikit Learn - Neural network to produce ROC CurvesAppending to numpy array for creating datasetFill missing values AND normalisePython - Converting 3D numpy array to 2DDetermine input array that approximates a target output array from complex numerical simulation?How can I create a new column of binary values from my TfidfVectorizer sparse matrix?Numpy array from pandas dataframe

How to break 信じようとしていただけかも知れない into separate parts?

How do I deal with an erroneously large refund?

A German immigrant ancestor has a "Registration Affidavit of Alien Enemy" on file. What does that mean exactly?

"Destructive force" carried by a B-52?

How can I wire a 9-position switch so that each position turns on one more LED than the one before?

Is the Mordenkainen's Sword spell underpowered?

Does using the Inspiration rules for character defects encourage My Guy Syndrome?

Proving inequality for positive definite matrix

Providing direct feedback to a product salesperson

What helicopter has the most rotor blades?

How is an IPA symbol that lacks a name (e.g. ɲ) called?

Output the slug and name of a CPT single post taxonomy term

Why doesn't the university give past final exams' answers?

/bin/ls sorts differently than just ls

Why these surprising proportionalities of integrals involving odd zeta values?

What came first? Venom as the movie or as the song?

How to keep bees out of canned beverages?

Why do people think Winterfell crypts is the safest place for women, children & old people?

Search results are from our test site instead of the live site. Magento 1.9

What is the evidence that custom checks in Northern Ireland are going to result in violence?

Can I take recommendation from someone I met at a conference?

What is the definining line between a helicopter and a drone a person can ride in?

Is "ein Herz wie das meine" an antiquated or colloquial use of the possesive pronoun?

Why aren't these two solutions equivalent? Combinatorics problem



Aggregate NumPy array with condition as mask



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsPCA algorithm problems - PythonConsistently inconsistent cross-validation results that are wildly different from original model accuracyIs this the correct way to apply a recommender system based on KNN and cosine similarity to predict continuous values?Using scikit Learn - Neural network to produce ROC CurvesAppending to numpy array for creating datasetFill missing values AND normalisePython - Converting 3D numpy array to 2DDetermine input array that approximates a target output array from complex numerical simulation?How can I create a new column of binary values from my TfidfVectorizer sparse matrix?Numpy array from pandas dataframe










4












$begingroup$


I have a matrix $b$ with elements:
$$b =
beginpmatrix
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
endpmatrix
$$
For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
beginpmatrix
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
endpmatrix
$$

At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.



It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.



To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum (textall the simulations that have values bigger than 5),$$I wish to find out the FIRST $sigma$ for which
$$frackappaN > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.



The code that I have written is:



# say 10000 simulations for a particular sigma 
SIMULATION = 10000

# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)

def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))


which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.



EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a for 15 different values of $sigma$ with 15 simulations each, and here they are:



array([[ 6, 2, 12, 12, 14, 14, 11, 11, 9, 23, 15, 3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])


As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.



EDIT 2
So now condition is giving me the right thing, which is an array of booleans.



array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])


So now the last row is the important thing here as it corresponds to the parameters, in this case,



array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])


Now the last row of condition is telling me that first True happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition where the first True in the last row appeared i.e. [i, j]. Now doing b[i, j] should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)










share|improve this question











$endgroup$











  • $begingroup$
    If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
    $endgroup$
    – n1k31t4
    Mar 31 at 23:22










  • $begingroup$
    Hi thanks for the reminder, I've added a to my edit.
    $endgroup$
    – user3613025
    Mar 31 at 23:41










  • $begingroup$
    Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
    $endgroup$
    – n1k31t4
    Mar 31 at 23:47















4












$begingroup$


I have a matrix $b$ with elements:
$$b =
beginpmatrix
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
endpmatrix
$$
For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
beginpmatrix
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
endpmatrix
$$

At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.



It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.



To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum (textall the simulations that have values bigger than 5),$$I wish to find out the FIRST $sigma$ for which
$$frackappaN > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.



The code that I have written is:



# say 10000 simulations for a particular sigma 
SIMULATION = 10000

# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)

def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))


which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.



EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a for 15 different values of $sigma$ with 15 simulations each, and here they are:



array([[ 6, 2, 12, 12, 14, 14, 11, 11, 9, 23, 15, 3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])


As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.



EDIT 2
So now condition is giving me the right thing, which is an array of booleans.



array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])


So now the last row is the important thing here as it corresponds to the parameters, in this case,



array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])


Now the last row of condition is telling me that first True happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition where the first True in the last row appeared i.e. [i, j]. Now doing b[i, j] should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)










share|improve this question











$endgroup$











  • $begingroup$
    If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
    $endgroup$
    – n1k31t4
    Mar 31 at 23:22










  • $begingroup$
    Hi thanks for the reminder, I've added a to my edit.
    $endgroup$
    – user3613025
    Mar 31 at 23:41










  • $begingroup$
    Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
    $endgroup$
    – n1k31t4
    Mar 31 at 23:47













4












4








4





$begingroup$


I have a matrix $b$ with elements:
$$b =
beginpmatrix
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
endpmatrix
$$
For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
beginpmatrix
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
endpmatrix
$$

At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.



It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.



To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum (textall the simulations that have values bigger than 5),$$I wish to find out the FIRST $sigma$ for which
$$frackappaN > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.



The code that I have written is:



# say 10000 simulations for a particular sigma 
SIMULATION = 10000

# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)

def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))


which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.



EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a for 15 different values of $sigma$ with 15 simulations each, and here they are:



array([[ 6, 2, 12, 12, 14, 14, 11, 11, 9, 23, 15, 3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])


As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.



EDIT 2
So now condition is giving me the right thing, which is an array of booleans.



array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])


So now the last row is the important thing here as it corresponds to the parameters, in this case,



array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])


Now the last row of condition is telling me that first True happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition where the first True in the last row appeared i.e. [i, j]. Now doing b[i, j] should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)










share|improve this question











$endgroup$




I have a matrix $b$ with elements:
$$b =
beginpmatrix
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
endpmatrix
$$
For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
beginpmatrix
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
endpmatrix
$$

At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.



It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.



To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum (textall the simulations that have values bigger than 5),$$I wish to find out the FIRST $sigma$ for which
$$frackappaN > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.



The code that I have written is:



# say 10000 simulations for a particular sigma 
SIMULATION = 10000

# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)

def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))


which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.



EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a for 15 different values of $sigma$ with 15 simulations each, and here they are:



array([[ 6, 2, 12, 12, 14, 14, 11, 11, 9, 23, 15, 3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])


As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.



EDIT 2
So now condition is giving me the right thing, which is an array of booleans.



array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])


So now the last row is the important thing here as it corresponds to the parameters, in this case,



array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])


Now the last row of condition is telling me that first True happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition where the first True in the last row appeared i.e. [i, j]. Now doing b[i, j] should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)







python dataset data numpy






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 5 at 14:25









n1k31t4

6,6562421




6,6562421










asked Mar 31 at 21:40









user3613025user3613025

235




235











  • $begingroup$
    If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
    $endgroup$
    – n1k31t4
    Mar 31 at 23:22










  • $begingroup$
    Hi thanks for the reminder, I've added a to my edit.
    $endgroup$
    – user3613025
    Mar 31 at 23:41










  • $begingroup$
    Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
    $endgroup$
    – n1k31t4
    Mar 31 at 23:47
















  • $begingroup$
    If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
    $endgroup$
    – n1k31t4
    Mar 31 at 23:22










  • $begingroup$
    Hi thanks for the reminder, I've added a to my edit.
    $endgroup$
    – user3613025
    Mar 31 at 23:41










  • $begingroup$
    Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
    $endgroup$
    – n1k31t4
    Mar 31 at 23:47















$begingroup$
If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
$endgroup$
– n1k31t4
Mar 31 at 23:22




$begingroup$
If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
$endgroup$
– n1k31t4
Mar 31 at 23:22












$begingroup$
Hi thanks for the reminder, I've added a to my edit.
$endgroup$
– user3613025
Mar 31 at 23:41




$begingroup$
Hi thanks for the reminder, I've added a to my edit.
$endgroup$
– user3613025
Mar 31 at 23:41












$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47




$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47










2 Answers
2






active

oldest

votes


















1












$begingroup$

I think I have understood your problem (mostly from the comments added in your function).



I'll show step by step what the logic is, building upon each previous step to get the final solution.



First we want to find all position where the matrix is larger than 5:



a > 5 # returns a boolean array with true/false in each position


Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



(a > 5) / SIMULATION # returns the value of one match


These values are required to sum to your threshold for an experiment to be valid.



Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b


Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
## because we already "normalised" the values within the cumsum.
condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
mask = np.where(condition)


I broke it down now as the expressions are getting long.



That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves


Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



valid_cols = mask[1][valid_rows]


So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



params = b[valid_rows, valid_cols]



If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.






share|improve this answer











$endgroup$












  • $begingroup$
    Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
    $endgroup$
    – user3613025
    Mar 31 at 23:45










  • $begingroup$
    your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
    $endgroup$
    – user3613025
    Apr 2 at 13:51










  • $begingroup$
    From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
    $endgroup$
    – n1k31t4
    Apr 2 at 16:08










  • $begingroup$
    Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
    $endgroup$
    – user3613025
    Apr 2 at 16:42











  • $begingroup$
    So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
    $endgroup$
    – n1k31t4
    Apr 2 at 23:56


















0












$begingroup$

Is this helpful?



import numpy as np, numpy.random as npr
N_sims = 15 # sims per sigma
N_vals = 15 # num sigmas
# Parameters
SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
# Generate "results" :3 (i.e., the matrix a)
RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
for i in range(N_vals):
RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
print("SIGMAn", SIGMA)
print("RESULTSn", RESULTS)
# Mark the positions > 5
more_than_five = RESULTS > 5
print("more_than_fiven", more_than_five)
# Count how many are greater than five, per column (i.e., per sigma)
counts = more_than_five.sum(axis=0)
print('COUNTSn', counts)
# Compute the proportions (so, 1 if all exps were > 5)
proportions = counts.astype(float) / N_sims
print('Proportionsn', proportions)
# Find the first time it is larger than 0.5
first_index = np.argmax( proportions > 0.95 )
print('---nFIRST INDEXn', first_index)





share|improve this answer









$endgroup$













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48316%2faggregate-numpy-array-with-condition-as-mask%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    I think I have understood your problem (mostly from the comments added in your function).



    I'll show step by step what the logic is, building upon each previous step to get the final solution.



    First we want to find all position where the matrix is larger than 5:



    a > 5 # returns a boolean array with true/false in each position


    Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



    (a > 5) / SIMULATION # returns the value of one match


    These values are required to sum to your threshold for an experiment to be valid.



    Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



    np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b


    Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



    ## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
    ## because we already "normalised" the values within the cumsum.
    condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
    mask = np.where(condition)


    I broke it down now as the expressions are getting long.



    That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



    valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves


    Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



    valid_cols = mask[1][valid_rows]


    So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



    params = b[valid_rows, valid_cols]



    If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.






    share|improve this answer











    $endgroup$












    • $begingroup$
      Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
      $endgroup$
      – user3613025
      Mar 31 at 23:45










    • $begingroup$
      your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
      $endgroup$
      – user3613025
      Apr 2 at 13:51










    • $begingroup$
      From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
      $endgroup$
      – n1k31t4
      Apr 2 at 16:08










    • $begingroup$
      Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
      $endgroup$
      – user3613025
      Apr 2 at 16:42











    • $begingroup$
      So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
      $endgroup$
      – n1k31t4
      Apr 2 at 23:56















    1












    $begingroup$

    I think I have understood your problem (mostly from the comments added in your function).



    I'll show step by step what the logic is, building upon each previous step to get the final solution.



    First we want to find all position where the matrix is larger than 5:



    a > 5 # returns a boolean array with true/false in each position


    Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



    (a > 5) / SIMULATION # returns the value of one match


    These values are required to sum to your threshold for an experiment to be valid.



    Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



    np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b


    Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



    ## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
    ## because we already "normalised" the values within the cumsum.
    condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
    mask = np.where(condition)


    I broke it down now as the expressions are getting long.



    That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



    valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves


    Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



    valid_cols = mask[1][valid_rows]


    So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



    params = b[valid_rows, valid_cols]



    If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.






    share|improve this answer











    $endgroup$












    • $begingroup$
      Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
      $endgroup$
      – user3613025
      Mar 31 at 23:45










    • $begingroup$
      your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
      $endgroup$
      – user3613025
      Apr 2 at 13:51










    • $begingroup$
      From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
      $endgroup$
      – n1k31t4
      Apr 2 at 16:08










    • $begingroup$
      Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
      $endgroup$
      – user3613025
      Apr 2 at 16:42











    • $begingroup$
      So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
      $endgroup$
      – n1k31t4
      Apr 2 at 23:56













    1












    1








    1





    $begingroup$

    I think I have understood your problem (mostly from the comments added in your function).



    I'll show step by step what the logic is, building upon each previous step to get the final solution.



    First we want to find all position where the matrix is larger than 5:



    a > 5 # returns a boolean array with true/false in each position


    Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



    (a > 5) / SIMULATION # returns the value of one match


    These values are required to sum to your threshold for an experiment to be valid.



    Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



    np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b


    Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



    ## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
    ## because we already "normalised" the values within the cumsum.
    condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
    mask = np.where(condition)


    I broke it down now as the expressions are getting long.



    That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



    valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves


    Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



    valid_cols = mask[1][valid_rows]


    So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



    params = b[valid_rows, valid_cols]



    If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.






    share|improve this answer











    $endgroup$



    I think I have understood your problem (mostly from the comments added in your function).



    I'll show step by step what the logic is, building upon each previous step to get the final solution.



    First we want to find all position where the matrix is larger than 5:



    a > 5 # returns a boolean array with true/false in each position


    Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



    (a > 5) / SIMULATION # returns the value of one match


    These values are required to sum to your threshold for an experiment to be valid.



    Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



    np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b


    Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



    ## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
    ## because we already "normalised" the values within the cumsum.
    condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
    mask = np.where(condition)


    I broke it down now as the expressions are getting long.



    That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



    valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves


    Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



    valid_cols = mask[1][valid_rows]


    So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



    params = b[valid_rows, valid_cols]



    If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Apr 3 at 12:59

























    answered Mar 31 at 23:43









    n1k31t4n1k31t4

    6,6562421




    6,6562421











    • $begingroup$
      Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
      $endgroup$
      – user3613025
      Mar 31 at 23:45










    • $begingroup$
      your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
      $endgroup$
      – user3613025
      Apr 2 at 13:51










    • $begingroup$
      From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
      $endgroup$
      – n1k31t4
      Apr 2 at 16:08










    • $begingroup$
      Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
      $endgroup$
      – user3613025
      Apr 2 at 16:42











    • $begingroup$
      So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
      $endgroup$
      – n1k31t4
      Apr 2 at 23:56
















    • $begingroup$
      Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
      $endgroup$
      – user3613025
      Mar 31 at 23:45










    • $begingroup$
      your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
      $endgroup$
      – user3613025
      Apr 2 at 13:51










    • $begingroup$
      From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
      $endgroup$
      – n1k31t4
      Apr 2 at 16:08










    • $begingroup$
      Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
      $endgroup$
      – user3613025
      Apr 2 at 16:42











    • $begingroup$
      So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
      $endgroup$
      – n1k31t4
      Apr 2 at 23:56















    $begingroup$
    Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
    $endgroup$
    – user3613025
    Mar 31 at 23:45




    $begingroup$
    Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
    $endgroup$
    – user3613025
    Mar 31 at 23:45












    $begingroup$
    your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
    $endgroup$
    – user3613025
    Apr 2 at 13:51




    $begingroup$
    your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
    $endgroup$
    – user3613025
    Apr 2 at 13:51












    $begingroup$
    From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
    $endgroup$
    – n1k31t4
    Apr 2 at 16:08




    $begingroup$
    From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
    $endgroup$
    – n1k31t4
    Apr 2 at 16:08












    $begingroup$
    Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
    $endgroup$
    – user3613025
    Apr 2 at 16:42





    $begingroup$
    Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
    $endgroup$
    – user3613025
    Apr 2 at 16:42













    $begingroup$
    So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
    $endgroup$
    – n1k31t4
    Apr 2 at 23:56




    $begingroup$
    So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
    $endgroup$
    – n1k31t4
    Apr 2 at 23:56











    0












    $begingroup$

    Is this helpful?



    import numpy as np, numpy.random as npr
    N_sims = 15 # sims per sigma
    N_vals = 15 # num sigmas
    # Parameters
    SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
    # Generate "results" :3 (i.e., the matrix a)
    RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
    for i in range(N_vals):
    RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
    print("SIGMAn", SIGMA)
    print("RESULTSn", RESULTS)
    # Mark the positions > 5
    more_than_five = RESULTS > 5
    print("more_than_fiven", more_than_five)
    # Count how many are greater than five, per column (i.e., per sigma)
    counts = more_than_five.sum(axis=0)
    print('COUNTSn', counts)
    # Compute the proportions (so, 1 if all exps were > 5)
    proportions = counts.astype(float) / N_sims
    print('Proportionsn', proportions)
    # Find the first time it is larger than 0.5
    first_index = np.argmax( proportions > 0.95 )
    print('---nFIRST INDEXn', first_index)





    share|improve this answer









    $endgroup$

















      0












      $begingroup$

      Is this helpful?



      import numpy as np, numpy.random as npr
      N_sims = 15 # sims per sigma
      N_vals = 15 # num sigmas
      # Parameters
      SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
      # Generate "results" :3 (i.e., the matrix a)
      RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
      for i in range(N_vals):
      RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
      print("SIGMAn", SIGMA)
      print("RESULTSn", RESULTS)
      # Mark the positions > 5
      more_than_five = RESULTS > 5
      print("more_than_fiven", more_than_five)
      # Count how many are greater than five, per column (i.e., per sigma)
      counts = more_than_five.sum(axis=0)
      print('COUNTSn', counts)
      # Compute the proportions (so, 1 if all exps were > 5)
      proportions = counts.astype(float) / N_sims
      print('Proportionsn', proportions)
      # Find the first time it is larger than 0.5
      first_index = np.argmax( proportions > 0.95 )
      print('---nFIRST INDEXn', first_index)





      share|improve this answer









      $endgroup$















        0












        0








        0





        $begingroup$

        Is this helpful?



        import numpy as np, numpy.random as npr
        N_sims = 15 # sims per sigma
        N_vals = 15 # num sigmas
        # Parameters
        SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
        # Generate "results" :3 (i.e., the matrix a)
        RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
        for i in range(N_vals):
        RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
        print("SIGMAn", SIGMA)
        print("RESULTSn", RESULTS)
        # Mark the positions > 5
        more_than_five = RESULTS > 5
        print("more_than_fiven", more_than_five)
        # Count how many are greater than five, per column (i.e., per sigma)
        counts = more_than_five.sum(axis=0)
        print('COUNTSn', counts)
        # Compute the proportions (so, 1 if all exps were > 5)
        proportions = counts.astype(float) / N_sims
        print('Proportionsn', proportions)
        # Find the first time it is larger than 0.5
        first_index = np.argmax( proportions > 0.95 )
        print('---nFIRST INDEXn', first_index)





        share|improve this answer









        $endgroup$



        Is this helpful?



        import numpy as np, numpy.random as npr
        N_sims = 15 # sims per sigma
        N_vals = 15 # num sigmas
        # Parameters
        SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
        # Generate "results" :3 (i.e., the matrix a)
        RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
        for i in range(N_vals):
        RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
        print("SIGMAn", SIGMA)
        print("RESULTSn", RESULTS)
        # Mark the positions > 5
        more_than_five = RESULTS > 5
        print("more_than_fiven", more_than_five)
        # Count how many are greater than five, per column (i.e., per sigma)
        counts = more_than_five.sum(axis=0)
        print('COUNTSn', counts)
        # Compute the proportions (so, 1 if all exps were > 5)
        proportions = counts.astype(float) / N_sims
        print('Proportionsn', proportions)
        # Find the first time it is larger than 0.5
        first_index = np.argmax( proportions > 0.95 )
        print('---nFIRST INDEXn', first_index)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 31 at 23:34









        user3658307user3658307

        1956




        1956



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48316%2faggregate-numpy-array-with-condition-as-mask%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

            Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

            Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?