What is the range of values of the expected percentile ranking? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsNeural Network - Sparsity of collaborative based filtering and modelling the prediction problemIn a recommender system, how can you normalise the similarity between two arbitrary users?Recreating the sum symbol using pythonWhat does it mean when we say most of the points in a hypercube are at the boundary?What does the term “proportional to” mean in Bayes Equation?How to choose negative examples for recommendation system?What are the introductory mathematics courses that are most pertinent to machine learning?What is the difference between parameters & cooficients in Machine learning?What methods exist for recommendation based on implicit information?What is the first tool to learn start your your data science projects?

Stop battery usage [Ubuntu 18]

Mortgage adviser recommends a longer term than necessary combined with overpayments

Who can trigger ship-wide alerts in Star Trek?

Can the prologue be the backstory of your main character?

How to rotate it perfectly?

What LEGO pieces have "real-world" functionality?

Single author papers against my advisor's will?

3 doors, three guards, one stone

Is there a service that would inform me whenever a new direct route is scheduled from a given airport?

Why don't the Weasley twins use magic outside of school if the Trace can only find the location of spells cast?

Active filter with series inductor and resistor - do these exist?

How many things? AとBがふたつ

Jazz greats knew nothing of modes. Why are they used to improvise on standards?

I'm thinking of a number

Replacing HDD with SSD; what about non-APFS/APFS?

Why use gamma over alpha radiation?

Autumning in love

What was the last x86 CPU that did not have the x87 floating-point unit built in?

If I can make up priors, why can't I make up posteriors?

Did the new image of black hole confirm the general theory of relativity?

Fishing simulator

How can players take actions together that are impossible otherwise?

What items from the Roman-age tech-level could be used to deter all creatures from entering a small area?

What loss function to use when labels are probabilities?



What is the range of values of the expected percentile ranking?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsNeural Network - Sparsity of collaborative based filtering and modelling the prediction problemIn a recommender system, how can you normalise the similarity between two arbitrary users?Recreating the sum symbol using pythonWhat does it mean when we say most of the points in a hypercube are at the boundary?What does the term “proportional to” mean in Bayes Equation?How to choose negative examples for recommendation system?What are the introductory mathematics courses that are most pertinent to machine learning?What is the difference between parameters & cooficients in Machine learning?What methods exist for recommendation based on implicit information?What is the first tool to learn start your your data science projects?










1












$begingroup$


I'm currently reading




Hu, Koren, Volinsky: Collaborative Filtering for Implicit Feedback Datasets




One thing that confuses me is the "expected percentile ranking", an function the authors define to evaluate the goodness of their recommendations. They define it in the Evaluation methodology on page 6 as:



$$overlinetextrank = fracsum_u,i r^t_ui textrank_uisum_u,i r^t_ui$$



where $u$ is a user, $i$ is an item (e.g. a TV show), $r_ui in [0, infty)$ is the amount how much user $u$ did watch show $i$. $textrank_ui in [0, 1]$ is the percentage rank of item $i$ for user $u$. For example, it is 0 if for user $u$ the item $i$ has the highest $r$ value and 1 if the item $i$ for user $u$ has the lowest $r$ value.



I'm not super sure if I understood it correctly.



The authors write that lower values of $overlinetextrank$ are more desirable and for random predictions would lead to an expected value of $overlinetextrank$ of 0.5.



Examples



  • Assume there is only one item. In this case $textrank = 0$. Makes sense, as there cannot be any predictions.

  • Assume there is only one user and two items with $r_1,1 = 1$ and $r_1,2 = 2$. Then:

$$overlinetextrank = frac1 cdot textrank_1, 1 + 2 cdot textrank_1, 21+2$$



This means $overlinetextrank in 2/3, 1/3$.



  • If there is only a single user and all $|I|$ values of $r_ui$ are the same, then $overlinetextrank = sum_ui textrank_ui = frac2$

Questions



  1. Is my understanding of the metric correct? Especially my last example and the statement by the authors that $overlinetextrank geq 50%$ indicated an algorithm is no better than random seem off.

  2. What is $t$?









share|improve this question









$endgroup$
















    1












    $begingroup$


    I'm currently reading




    Hu, Koren, Volinsky: Collaborative Filtering for Implicit Feedback Datasets




    One thing that confuses me is the "expected percentile ranking", an function the authors define to evaluate the goodness of their recommendations. They define it in the Evaluation methodology on page 6 as:



    $$overlinetextrank = fracsum_u,i r^t_ui textrank_uisum_u,i r^t_ui$$



    where $u$ is a user, $i$ is an item (e.g. a TV show), $r_ui in [0, infty)$ is the amount how much user $u$ did watch show $i$. $textrank_ui in [0, 1]$ is the percentage rank of item $i$ for user $u$. For example, it is 0 if for user $u$ the item $i$ has the highest $r$ value and 1 if the item $i$ for user $u$ has the lowest $r$ value.



    I'm not super sure if I understood it correctly.



    The authors write that lower values of $overlinetextrank$ are more desirable and for random predictions would lead to an expected value of $overlinetextrank$ of 0.5.



    Examples



    • Assume there is only one item. In this case $textrank = 0$. Makes sense, as there cannot be any predictions.

    • Assume there is only one user and two items with $r_1,1 = 1$ and $r_1,2 = 2$. Then:

    $$overlinetextrank = frac1 cdot textrank_1, 1 + 2 cdot textrank_1, 21+2$$



    This means $overlinetextrank in 2/3, 1/3$.



    • If there is only a single user and all $|I|$ values of $r_ui$ are the same, then $overlinetextrank = sum_ui textrank_ui = frac2$

    Questions



    1. Is my understanding of the metric correct? Especially my last example and the statement by the authors that $overlinetextrank geq 50%$ indicated an algorithm is no better than random seem off.

    2. What is $t$?









    share|improve this question









    $endgroup$














      1












      1








      1





      $begingroup$


      I'm currently reading




      Hu, Koren, Volinsky: Collaborative Filtering for Implicit Feedback Datasets




      One thing that confuses me is the "expected percentile ranking", an function the authors define to evaluate the goodness of their recommendations. They define it in the Evaluation methodology on page 6 as:



      $$overlinetextrank = fracsum_u,i r^t_ui textrank_uisum_u,i r^t_ui$$



      where $u$ is a user, $i$ is an item (e.g. a TV show), $r_ui in [0, infty)$ is the amount how much user $u$ did watch show $i$. $textrank_ui in [0, 1]$ is the percentage rank of item $i$ for user $u$. For example, it is 0 if for user $u$ the item $i$ has the highest $r$ value and 1 if the item $i$ for user $u$ has the lowest $r$ value.



      I'm not super sure if I understood it correctly.



      The authors write that lower values of $overlinetextrank$ are more desirable and for random predictions would lead to an expected value of $overlinetextrank$ of 0.5.



      Examples



      • Assume there is only one item. In this case $textrank = 0$. Makes sense, as there cannot be any predictions.

      • Assume there is only one user and two items with $r_1,1 = 1$ and $r_1,2 = 2$. Then:

      $$overlinetextrank = frac1 cdot textrank_1, 1 + 2 cdot textrank_1, 21+2$$



      This means $overlinetextrank in 2/3, 1/3$.



      • If there is only a single user and all $|I|$ values of $r_ui$ are the same, then $overlinetextrank = sum_ui textrank_ui = frac2$

      Questions



      1. Is my understanding of the metric correct? Especially my last example and the statement by the authors that $overlinetextrank geq 50%$ indicated an algorithm is no better than random seem off.

      2. What is $t$?









      share|improve this question









      $endgroup$




      I'm currently reading




      Hu, Koren, Volinsky: Collaborative Filtering for Implicit Feedback Datasets




      One thing that confuses me is the "expected percentile ranking", an function the authors define to evaluate the goodness of their recommendations. They define it in the Evaluation methodology on page 6 as:



      $$overlinetextrank = fracsum_u,i r^t_ui textrank_uisum_u,i r^t_ui$$



      where $u$ is a user, $i$ is an item (e.g. a TV show), $r_ui in [0, infty)$ is the amount how much user $u$ did watch show $i$. $textrank_ui in [0, 1]$ is the percentage rank of item $i$ for user $u$. For example, it is 0 if for user $u$ the item $i$ has the highest $r$ value and 1 if the item $i$ for user $u$ has the lowest $r$ value.



      I'm not super sure if I understood it correctly.



      The authors write that lower values of $overlinetextrank$ are more desirable and for random predictions would lead to an expected value of $overlinetextrank$ of 0.5.



      Examples



      • Assume there is only one item. In this case $textrank = 0$. Makes sense, as there cannot be any predictions.

      • Assume there is only one user and two items with $r_1,1 = 1$ and $r_1,2 = 2$. Then:

      $$overlinetextrank = frac1 cdot textrank_1, 1 + 2 cdot textrank_1, 21+2$$



      This means $overlinetextrank in 2/3, 1/3$.



      • If there is only a single user and all $|I|$ values of $r_ui$ are the same, then $overlinetextrank = sum_ui textrank_ui = frac2$

      Questions



      1. Is my understanding of the metric correct? Especially my last example and the statement by the authors that $overlinetextrank geq 50%$ indicated an algorithm is no better than random seem off.

      2. What is $t$?






      recommender-system math






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 2 at 7:13









      Martin ThomaMartin Thoma

      6,6951657134




      6,6951657134




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$


          What is $t$?




          It means observed $r_ui$ in the one-week test set (page 6-left).




          Is my understanding of the metric correct?




          First two examples are correct. Assuming user-item relation $r_ui^t$ is constant $a$ for all items in the test set, and predicted ranks are uniform across $[0, 1]$, then, the third one would be:



          $$overlinetextrank = fracsum_u,i r^t_ui textrank_uisum_u,i r^t_ui=fracsum_u,i a text rank_uisum_u,i a=frac1sum_u,i text rank_ui=frac1frac2=frac12$$
          This makes sense. Items are identical to the user, therefore no model can do better than random guessing, since there is no observed preference to help the model favor one item over the other. Of course, another assumption here is that training (4 weeks) and test (next week) sets are from the same distribution.






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48411%2fwhat-is-the-range-of-values-of-the-expected-percentile-ranking%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$


            What is $t$?




            It means observed $r_ui$ in the one-week test set (page 6-left).




            Is my understanding of the metric correct?




            First two examples are correct. Assuming user-item relation $r_ui^t$ is constant $a$ for all items in the test set, and predicted ranks are uniform across $[0, 1]$, then, the third one would be:



            $$overlinetextrank = fracsum_u,i r^t_ui textrank_uisum_u,i r^t_ui=fracsum_u,i a text rank_uisum_u,i a=frac1sum_u,i text rank_ui=frac1frac2=frac12$$
            This makes sense. Items are identical to the user, therefore no model can do better than random guessing, since there is no observed preference to help the model favor one item over the other. Of course, another assumption here is that training (4 weeks) and test (next week) sets are from the same distribution.






            share|improve this answer









            $endgroup$

















              1












              $begingroup$


              What is $t$?




              It means observed $r_ui$ in the one-week test set (page 6-left).




              Is my understanding of the metric correct?




              First two examples are correct. Assuming user-item relation $r_ui^t$ is constant $a$ for all items in the test set, and predicted ranks are uniform across $[0, 1]$, then, the third one would be:



              $$overlinetextrank = fracsum_u,i r^t_ui textrank_uisum_u,i r^t_ui=fracsum_u,i a text rank_uisum_u,i a=frac1sum_u,i text rank_ui=frac1frac2=frac12$$
              This makes sense. Items are identical to the user, therefore no model can do better than random guessing, since there is no observed preference to help the model favor one item over the other. Of course, another assumption here is that training (4 weeks) and test (next week) sets are from the same distribution.






              share|improve this answer









              $endgroup$















                1












                1








                1





                $begingroup$


                What is $t$?




                It means observed $r_ui$ in the one-week test set (page 6-left).




                Is my understanding of the metric correct?




                First two examples are correct. Assuming user-item relation $r_ui^t$ is constant $a$ for all items in the test set, and predicted ranks are uniform across $[0, 1]$, then, the third one would be:



                $$overlinetextrank = fracsum_u,i r^t_ui textrank_uisum_u,i r^t_ui=fracsum_u,i a text rank_uisum_u,i a=frac1sum_u,i text rank_ui=frac1frac2=frac12$$
                This makes sense. Items are identical to the user, therefore no model can do better than random guessing, since there is no observed preference to help the model favor one item over the other. Of course, another assumption here is that training (4 weeks) and test (next week) sets are from the same distribution.






                share|improve this answer









                $endgroup$




                What is $t$?




                It means observed $r_ui$ in the one-week test set (page 6-left).




                Is my understanding of the metric correct?




                First two examples are correct. Assuming user-item relation $r_ui^t$ is constant $a$ for all items in the test set, and predicted ranks are uniform across $[0, 1]$, then, the third one would be:



                $$overlinetextrank = fracsum_u,i r^t_ui textrank_uisum_u,i r^t_ui=fracsum_u,i a text rank_uisum_u,i a=frac1sum_u,i text rank_ui=frac1frac2=frac12$$
                This makes sense. Items are identical to the user, therefore no model can do better than random guessing, since there is no observed preference to help the model favor one item over the other. Of course, another assumption here is that training (4 weeks) and test (next week) sets are from the same distribution.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 2 at 8:23









                EsmailianEsmailian

                3,206320




                3,206320



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48411%2fwhat-is-the-range-of-values-of-the-expected-percentile-ranking%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                    Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High