A/B testing: How to calculate p-value on post test segments?Word Frequency Analysis of Document SetsHow to control false positives in sequential A/B testing while keeping a low sample size?How to generate bootstrapping samples in R?What are the methods to ensure that the population split for A/B test is random?Campaign Hypothesis Testing: Is using chi-square test appropriate?Approaches to A/B testing when you can't randomize on the user levelhow to calculate p valueTesting whether observed control/test split invalidates my assumption of 50/50 randomised trafficHow to verify A/B test

Lock in SQL Server and Oracle

Has any spacecraft ever had the ability to directly communicate with civilian air traffic control?

How to replace the "space symbol" (squat-u) in listings?

In gnome-terminal only 2 out of 3 zoom keys work

Airbnb - host wants to reduce rooms, can we get refund?

A non-technological, repeating, visible object in the sky, holding its position in the sky for hours

Build a trail cart

What's the polite way to say "I need to urinate"?

Confusion about capacitors

How to determine the actual or "true" resolution of a digital photograph?

Why do Ichisongas hate elephants and hippos?

Does a creature that is immune to a condition still make a saving throw?

Is it possible to Ready a spell to be cast just before the start of your next turn by having the trigger be an ally's attack?

What's the metal clinking sound at the end of credits in Avengers: Endgame?

Possible to set `foldexpr` using a function reference?

How deep to place a deadman anchor for a slackline?

When did stoichiometry begin to be taught in U.S. high schools?

Why does Bran Stark feel that Jon Snow "needs to know" about his lineage?

How to set the font color of quantity objects (Version 11.3 vs version 12)?

What is a Recurrent Neural Network?

Was it really necessary for the Lunar Module to have 2 stages?

What does "rf" mean in "rfkill"?

Is it possible to measure lightning discharges as Nikola Tesla?

Is thermodynamics only applicable to systems in equilibrium?



A/B testing: How to calculate p-value on post test segments?


Word Frequency Analysis of Document SetsHow to control false positives in sequential A/B testing while keeping a low sample size?How to generate bootstrapping samples in R?What are the methods to ensure that the population split for A/B test is random?Campaign Hypothesis Testing: Is using chi-square test appropriate?Approaches to A/B testing when you can't randomize on the user levelhow to calculate p valueTesting whether observed control/test split invalidates my assumption of 50/50 randomised trafficHow to verify A/B test













4












$begingroup$


My question on A/B testing is about doing post test segmentation analysis.



For example:




I run an A/B test on my website to track bounce rate. On the
treatment group, i put a video to explain my company. On the control
group i put just plain text. I pick a segment of users who are first
time users from USA to be split 50/50 into the 2 groups.




Metric that i am tracking is average bounce rate (assume 20%). 
Power effect (0.8)
effect size i expect to see(10% so bounce rate should fall to (20% - 0.10 * 20% = 18%))
Calculated sample size required is say 1000 for each group.


Say i run the test for the correct amount of time. At the end of the test, i get a p-value of 0.06. i do not reject the null hypothesis.




However, when i do post test segmentation analysis, for example, i saw
that users who signed up for a free trial, 44% of them played the
video.




In this case, how do i calculate if the 44% was significant? (while taking into account the multiple comparison problem?)
Like in the Airbnb experiment, they did post segmentation analysis on the browser type and was able to calculate the p-value.
enter image description here



My approach



Does this mean that for every segment that i want to analyze, i need to have at least 1000 samples? Also how would i recalculate the p-value given that the p-value of this A/B test was already generated above as 0.06?










share|improve this question









$endgroup$











  • $begingroup$
    You probably need to start by studying how hypothesis testing works: en.wikipedia.org/wiki/Statistical_hypothesis_testing. For instance, what is your null hypothesis? your alternative hypothesis? your test statistic? And I don't know where you're getting "does this mean ... i need 1000 samples" is coming from; you might need to explain your thinking/reasoning. Finally, please ask only one question per post.
    $endgroup$
    – D.W.
    Jun 12 '18 at 23:23











  • $begingroup$
    Cross-posted: datascience.stackexchange.com/q/24702/8560, stats.stackexchange.com/q/313582/2921. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted.
    $endgroup$
    – D.W.
    Jun 12 '18 at 23:26















4












$begingroup$


My question on A/B testing is about doing post test segmentation analysis.



For example:




I run an A/B test on my website to track bounce rate. On the
treatment group, i put a video to explain my company. On the control
group i put just plain text. I pick a segment of users who are first
time users from USA to be split 50/50 into the 2 groups.




Metric that i am tracking is average bounce rate (assume 20%). 
Power effect (0.8)
effect size i expect to see(10% so bounce rate should fall to (20% - 0.10 * 20% = 18%))
Calculated sample size required is say 1000 for each group.


Say i run the test for the correct amount of time. At the end of the test, i get a p-value of 0.06. i do not reject the null hypothesis.




However, when i do post test segmentation analysis, for example, i saw
that users who signed up for a free trial, 44% of them played the
video.




In this case, how do i calculate if the 44% was significant? (while taking into account the multiple comparison problem?)
Like in the Airbnb experiment, they did post segmentation analysis on the browser type and was able to calculate the p-value.
enter image description here



My approach



Does this mean that for every segment that i want to analyze, i need to have at least 1000 samples? Also how would i recalculate the p-value given that the p-value of this A/B test was already generated above as 0.06?










share|improve this question









$endgroup$











  • $begingroup$
    You probably need to start by studying how hypothesis testing works: en.wikipedia.org/wiki/Statistical_hypothesis_testing. For instance, what is your null hypothesis? your alternative hypothesis? your test statistic? And I don't know where you're getting "does this mean ... i need 1000 samples" is coming from; you might need to explain your thinking/reasoning. Finally, please ask only one question per post.
    $endgroup$
    – D.W.
    Jun 12 '18 at 23:23











  • $begingroup$
    Cross-posted: datascience.stackexchange.com/q/24702/8560, stats.stackexchange.com/q/313582/2921. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted.
    $endgroup$
    – D.W.
    Jun 12 '18 at 23:26













4












4








4


1



$begingroup$


My question on A/B testing is about doing post test segmentation analysis.



For example:




I run an A/B test on my website to track bounce rate. On the
treatment group, i put a video to explain my company. On the control
group i put just plain text. I pick a segment of users who are first
time users from USA to be split 50/50 into the 2 groups.




Metric that i am tracking is average bounce rate (assume 20%). 
Power effect (0.8)
effect size i expect to see(10% so bounce rate should fall to (20% - 0.10 * 20% = 18%))
Calculated sample size required is say 1000 for each group.


Say i run the test for the correct amount of time. At the end of the test, i get a p-value of 0.06. i do not reject the null hypothesis.




However, when i do post test segmentation analysis, for example, i saw
that users who signed up for a free trial, 44% of them played the
video.




In this case, how do i calculate if the 44% was significant? (while taking into account the multiple comparison problem?)
Like in the Airbnb experiment, they did post segmentation analysis on the browser type and was able to calculate the p-value.
enter image description here



My approach



Does this mean that for every segment that i want to analyze, i need to have at least 1000 samples? Also how would i recalculate the p-value given that the p-value of this A/B test was already generated above as 0.06?










share|improve this question









$endgroup$




My question on A/B testing is about doing post test segmentation analysis.



For example:




I run an A/B test on my website to track bounce rate. On the
treatment group, i put a video to explain my company. On the control
group i put just plain text. I pick a segment of users who are first
time users from USA to be split 50/50 into the 2 groups.




Metric that i am tracking is average bounce rate (assume 20%). 
Power effect (0.8)
effect size i expect to see(10% so bounce rate should fall to (20% - 0.10 * 20% = 18%))
Calculated sample size required is say 1000 for each group.


Say i run the test for the correct amount of time. At the end of the test, i get a p-value of 0.06. i do not reject the null hypothesis.




However, when i do post test segmentation analysis, for example, i saw
that users who signed up for a free trial, 44% of them played the
video.




In this case, how do i calculate if the 44% was significant? (while taking into account the multiple comparison problem?)
Like in the Airbnb experiment, they did post segmentation analysis on the browser type and was able to calculate the p-value.
enter image description here



My approach



Does this mean that for every segment that i want to analyze, i need to have at least 1000 samples? Also how would i recalculate the p-value given that the p-value of this A/B test was already generated above as 0.06?







statistics ab-test experiments






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 14 '17 at 2:14









jxnjxn

1383




1383











  • $begingroup$
    You probably need to start by studying how hypothesis testing works: en.wikipedia.org/wiki/Statistical_hypothesis_testing. For instance, what is your null hypothesis? your alternative hypothesis? your test statistic? And I don't know where you're getting "does this mean ... i need 1000 samples" is coming from; you might need to explain your thinking/reasoning. Finally, please ask only one question per post.
    $endgroup$
    – D.W.
    Jun 12 '18 at 23:23











  • $begingroup$
    Cross-posted: datascience.stackexchange.com/q/24702/8560, stats.stackexchange.com/q/313582/2921. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted.
    $endgroup$
    – D.W.
    Jun 12 '18 at 23:26
















  • $begingroup$
    You probably need to start by studying how hypothesis testing works: en.wikipedia.org/wiki/Statistical_hypothesis_testing. For instance, what is your null hypothesis? your alternative hypothesis? your test statistic? And I don't know where you're getting "does this mean ... i need 1000 samples" is coming from; you might need to explain your thinking/reasoning. Finally, please ask only one question per post.
    $endgroup$
    – D.W.
    Jun 12 '18 at 23:23











  • $begingroup$
    Cross-posted: datascience.stackexchange.com/q/24702/8560, stats.stackexchange.com/q/313582/2921. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted.
    $endgroup$
    – D.W.
    Jun 12 '18 at 23:26















$begingroup$
You probably need to start by studying how hypothesis testing works: en.wikipedia.org/wiki/Statistical_hypothesis_testing. For instance, what is your null hypothesis? your alternative hypothesis? your test statistic? And I don't know where you're getting "does this mean ... i need 1000 samples" is coming from; you might need to explain your thinking/reasoning. Finally, please ask only one question per post.
$endgroup$
– D.W.
Jun 12 '18 at 23:23





$begingroup$
You probably need to start by studying how hypothesis testing works: en.wikipedia.org/wiki/Statistical_hypothesis_testing. For instance, what is your null hypothesis? your alternative hypothesis? your test statistic? And I don't know where you're getting "does this mean ... i need 1000 samples" is coming from; you might need to explain your thinking/reasoning. Finally, please ask only one question per post.
$endgroup$
– D.W.
Jun 12 '18 at 23:23













$begingroup$
Cross-posted: datascience.stackexchange.com/q/24702/8560, stats.stackexchange.com/q/313582/2921. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted.
$endgroup$
– D.W.
Jun 12 '18 at 23:26




$begingroup$
Cross-posted: datascience.stackexchange.com/q/24702/8560, stats.stackexchange.com/q/313582/2921. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted.
$endgroup$
– D.W.
Jun 12 '18 at 23:26










1 Answer
1






active

oldest

votes


















0












$begingroup$

Well if you want to answer the question if a single segment reaches the same level and you ignore all other segments behaviors then this should be the required number (given that initial performance of the segments was the same).



As a warning when you use to many segments this: https://xkcd.com/882/
can happen.






share|improve this answer









$endgroup$













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f24702%2fa-b-testing-how-to-calculate-p-value-on-post-test-segments%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    Well if you want to answer the question if a single segment reaches the same level and you ignore all other segments behaviors then this should be the required number (given that initial performance of the segments was the same).



    As a warning when you use to many segments this: https://xkcd.com/882/
    can happen.






    share|improve this answer









    $endgroup$

















      0












      $begingroup$

      Well if you want to answer the question if a single segment reaches the same level and you ignore all other segments behaviors then this should be the required number (given that initial performance of the segments was the same).



      As a warning when you use to many segments this: https://xkcd.com/882/
      can happen.






      share|improve this answer









      $endgroup$















        0












        0








        0





        $begingroup$

        Well if you want to answer the question if a single segment reaches the same level and you ignore all other segments behaviors then this should be the required number (given that initial performance of the segments was the same).



        As a warning when you use to many segments this: https://xkcd.com/882/
        can happen.






        share|improve this answer









        $endgroup$



        Well if you want to answer the question if a single segment reaches the same level and you ignore all other segments behaviors then this should be the required number (given that initial performance of the segments was the same).



        As a warning when you use to many segments this: https://xkcd.com/882/
        can happen.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '17 at 11:00









        El BurroEl Burro

        460311




        460311



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f24702%2fa-b-testing-how-to-calculate-p-value-on-post-test-segments%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

            Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

            Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?