A weird result from a recommender systemMulticlass Classification with large number of categoriesTaxonomy of recommender system methodologieshow to evaluate top n recommendation system with movie lens dataset?Recommendation matrix as a product of User Similarity and RatingsRecommender system based on purchase history, not ratingsTensor Decomposition for Higher-Order Context-Aware Recommender SystemsCalculate similarity on boolean dataIs there any standard pattern recognition algorithm in predicting an item which a user will be buying next, given I have the history of the purchasesHow to create user and item profile in an item to item collaborative filtering? (Non-rating case)recommender systems : how to deal with items that change over time?

What to do when eye contact makes your coworker uncomfortable?

Why can't the Brexit deadlock in the UK parliament be solved with a plurality vote?

Can you use Vicious Mockery to win an argument or gain favours?

Will number of steps recorded on FitBit/any fitness tracker add up distance in PokemonGo?

Confused about Cramer-Rao lower bound and CLT

What is the English pronunciation of "pain au chocolat"?

Does Doodling or Improvising on the Piano Have Any Benefits?

Biological Blimps: Propulsion

Pre-mixing cryogenic fuels and using only one fuel tank

Circuit Analysis: Obtaining Close Loop OP - AMP Transfer function

Can I say "fingers" when referring to toes?

Why Shazam when there is already Superman?

How could a planet have erratic days?

Why is so much work done on numerical verification of the Riemann Hypothesis?

Will the Sticky MAC access policy prevent unauthorized hubs from connecting to a network?

How to explain what's wrong with this application of the chain rule?

What (the heck) is a Super Worm Equinox Moon?

Why should universal income be universal?

What is Cash Advance APR?

What does "Scientists rise up against statistical significance" mean? (Comment in Nature)

Temporarily disable WLAN internet access for children, but allow it for adults

Which was the first story featuring espers?

What's the name of the logical fallacy where a debater extends a statement far beyond the original statement to make it true?

How much theory knowledge is actually used while playing?



A weird result from a recommender system


Multiclass Classification with large number of categoriesTaxonomy of recommender system methodologieshow to evaluate top n recommendation system with movie lens dataset?Recommendation matrix as a product of User Similarity and RatingsRecommender system based on purchase history, not ratingsTensor Decomposition for Higher-Order Context-Aware Recommender SystemsCalculate similarity on boolean dataIs there any standard pattern recognition algorithm in predicting an item which a user will be buying next, given I have the history of the purchasesHow to create user and item profile in an item to item collaborative filtering? (Non-rating case)recommender systems : how to deal with items that change over time?













1












$begingroup$


Say there're the top 10 most popular items among 100 sales products and about 100k users regularly purchase items on daily basis.



A = has been purchased by 100k users. 
B = has been purchased by 30k users.
C = has been purchased by 20k users.
D = has been purchased by 18k users.
E = has been purchased by 10k users.
F = has been purchased by 8k users.
G = has been purchased by 7k users.
H = has been purchased by 4k users.
I = has been purchased by 3k users.
J = has been purchased by 1k users.

X = never bought by anyone.
Y = never bought by anyone.
Z = never bought by anyone.


So basing on this fact, the training data is going to have more than 50m rows of data like this.



User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.


The preferences of users will be like this, they might be countless but let's take one example.



Thomas bought A, B, C, D
Angelica bought A, B, C, D
Gloria bought A, B, C, D
Jennifer bought A, B, C, D and I


Using the user based collaborative filtering, it is quite obvious that Thomas, Angelica, Gloria are likely to get the item I as a recommended item because Jennifer likes I item and also has the exact same purchase pattern as the others do.



With this in mind, I was starting to think that if I have another two users who bought the unpopular items X,Y,Z, the predictions on them will result in recommending the unsold items.



So I added dummy data manually before training the model like this.



User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.
100001 Andrew 24 X true 19, Mar 2019
100001 Andrew 25 Y true 19, Mar 2019
100002 Andy 24 X true 19, Mar 2019
100002 Andy 25 Y true 19, Mar 2019
100002 Andy 26 Z true 19, Mar 2019


As I mentioned above, I thought Andrew will get Z as a recommended item because Andrew has a common in the item preference with Andy and he bought Z as well, even if the purchase data for X,Y and Z has a extremely small portion of training data ( only 5 records exist among the 10M records of data ).



But the result was totally unexpected.



Every users have X, Y, Z in their recommended list, although the prediction score is very low compared to the others and what's more puzzling is that Andrew and Andy have no outstanding scores on the unpopular items even if they actually bought them!



I don't know why this happens, do I misunderstand the user based collaborative filtering concept?










share|improve this question











$endgroup$







  • 1




    $begingroup$
    How did you implement the recommendation system? Can you share some code or something?
    $endgroup$
    – yoav_aaa
    Mar 18 at 8:50










  • $begingroup$
    Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
    $endgroup$
    – hina10531
    Mar 18 at 11:14
















1












$begingroup$


Say there're the top 10 most popular items among 100 sales products and about 100k users regularly purchase items on daily basis.



A = has been purchased by 100k users. 
B = has been purchased by 30k users.
C = has been purchased by 20k users.
D = has been purchased by 18k users.
E = has been purchased by 10k users.
F = has been purchased by 8k users.
G = has been purchased by 7k users.
H = has been purchased by 4k users.
I = has been purchased by 3k users.
J = has been purchased by 1k users.

X = never bought by anyone.
Y = never bought by anyone.
Z = never bought by anyone.


So basing on this fact, the training data is going to have more than 50m rows of data like this.



User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.


The preferences of users will be like this, they might be countless but let's take one example.



Thomas bought A, B, C, D
Angelica bought A, B, C, D
Gloria bought A, B, C, D
Jennifer bought A, B, C, D and I


Using the user based collaborative filtering, it is quite obvious that Thomas, Angelica, Gloria are likely to get the item I as a recommended item because Jennifer likes I item and also has the exact same purchase pattern as the others do.



With this in mind, I was starting to think that if I have another two users who bought the unpopular items X,Y,Z, the predictions on them will result in recommending the unsold items.



So I added dummy data manually before training the model like this.



User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.
100001 Andrew 24 X true 19, Mar 2019
100001 Andrew 25 Y true 19, Mar 2019
100002 Andy 24 X true 19, Mar 2019
100002 Andy 25 Y true 19, Mar 2019
100002 Andy 26 Z true 19, Mar 2019


As I mentioned above, I thought Andrew will get Z as a recommended item because Andrew has a common in the item preference with Andy and he bought Z as well, even if the purchase data for X,Y and Z has a extremely small portion of training data ( only 5 records exist among the 10M records of data ).



But the result was totally unexpected.



Every users have X, Y, Z in their recommended list, although the prediction score is very low compared to the others and what's more puzzling is that Andrew and Andy have no outstanding scores on the unpopular items even if they actually bought them!



I don't know why this happens, do I misunderstand the user based collaborative filtering concept?










share|improve this question











$endgroup$







  • 1




    $begingroup$
    How did you implement the recommendation system? Can you share some code or something?
    $endgroup$
    – yoav_aaa
    Mar 18 at 8:50










  • $begingroup$
    Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
    $endgroup$
    – hina10531
    Mar 18 at 11:14














1












1








1





$begingroup$


Say there're the top 10 most popular items among 100 sales products and about 100k users regularly purchase items on daily basis.



A = has been purchased by 100k users. 
B = has been purchased by 30k users.
C = has been purchased by 20k users.
D = has been purchased by 18k users.
E = has been purchased by 10k users.
F = has been purchased by 8k users.
G = has been purchased by 7k users.
H = has been purchased by 4k users.
I = has been purchased by 3k users.
J = has been purchased by 1k users.

X = never bought by anyone.
Y = never bought by anyone.
Z = never bought by anyone.


So basing on this fact, the training data is going to have more than 50m rows of data like this.



User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.


The preferences of users will be like this, they might be countless but let's take one example.



Thomas bought A, B, C, D
Angelica bought A, B, C, D
Gloria bought A, B, C, D
Jennifer bought A, B, C, D and I


Using the user based collaborative filtering, it is quite obvious that Thomas, Angelica, Gloria are likely to get the item I as a recommended item because Jennifer likes I item and also has the exact same purchase pattern as the others do.



With this in mind, I was starting to think that if I have another two users who bought the unpopular items X,Y,Z, the predictions on them will result in recommending the unsold items.



So I added dummy data manually before training the model like this.



User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.
100001 Andrew 24 X true 19, Mar 2019
100001 Andrew 25 Y true 19, Mar 2019
100002 Andy 24 X true 19, Mar 2019
100002 Andy 25 Y true 19, Mar 2019
100002 Andy 26 Z true 19, Mar 2019


As I mentioned above, I thought Andrew will get Z as a recommended item because Andrew has a common in the item preference with Andy and he bought Z as well, even if the purchase data for X,Y and Z has a extremely small portion of training data ( only 5 records exist among the 10M records of data ).



But the result was totally unexpected.



Every users have X, Y, Z in their recommended list, although the prediction score is very low compared to the others and what's more puzzling is that Andrew and Andy have no outstanding scores on the unpopular items even if they actually bought them!



I don't know why this happens, do I misunderstand the user based collaborative filtering concept?










share|improve this question











$endgroup$




Say there're the top 10 most popular items among 100 sales products and about 100k users regularly purchase items on daily basis.



A = has been purchased by 100k users. 
B = has been purchased by 30k users.
C = has been purchased by 20k users.
D = has been purchased by 18k users.
E = has been purchased by 10k users.
F = has been purchased by 8k users.
G = has been purchased by 7k users.
H = has been purchased by 4k users.
I = has been purchased by 3k users.
J = has been purchased by 1k users.

X = never bought by anyone.
Y = never bought by anyone.
Z = never bought by anyone.


So basing on this fact, the training data is going to have more than 50m rows of data like this.



User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.


The preferences of users will be like this, they might be countless but let's take one example.



Thomas bought A, B, C, D
Angelica bought A, B, C, D
Gloria bought A, B, C, D
Jennifer bought A, B, C, D and I


Using the user based collaborative filtering, it is quite obvious that Thomas, Angelica, Gloria are likely to get the item I as a recommended item because Jennifer likes I item and also has the exact same purchase pattern as the others do.



With this in mind, I was starting to think that if I have another two users who bought the unpopular items X,Y,Z, the predictions on them will result in recommending the unsold items.



So I added dummy data manually before training the model like this.



User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.
100001 Andrew 24 X true 19, Mar 2019
100001 Andrew 25 Y true 19, Mar 2019
100002 Andy 24 X true 19, Mar 2019
100002 Andy 25 Y true 19, Mar 2019
100002 Andy 26 Z true 19, Mar 2019


As I mentioned above, I thought Andrew will get Z as a recommended item because Andrew has a common in the item preference with Andy and he bought Z as well, even if the purchase data for X,Y and Z has a extremely small portion of training data ( only 5 records exist among the 10M records of data ).



But the result was totally unexpected.



Every users have X, Y, Z in their recommended list, although the prediction score is very low compared to the others and what's more puzzling is that Andrew and Andy have no outstanding scores on the unpopular items even if they actually bought them!



I don't know why this happens, do I misunderstand the user based collaborative filtering concept?







machine-learning recommender-system






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 18 at 7:30







hina10531

















asked Mar 18 at 7:25









hina10531hina10531

1164




1164







  • 1




    $begingroup$
    How did you implement the recommendation system? Can you share some code or something?
    $endgroup$
    – yoav_aaa
    Mar 18 at 8:50










  • $begingroup$
    Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
    $endgroup$
    – hina10531
    Mar 18 at 11:14













  • 1




    $begingroup$
    How did you implement the recommendation system? Can you share some code or something?
    $endgroup$
    – yoav_aaa
    Mar 18 at 8:50










  • $begingroup$
    Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
    $endgroup$
    – hina10531
    Mar 18 at 11:14








1




1




$begingroup$
How did you implement the recommendation system? Can you share some code or something?
$endgroup$
– yoav_aaa
Mar 18 at 8:50




$begingroup$
How did you implement the recommendation system? Can you share some code or something?
$endgroup$
– yoav_aaa
Mar 18 at 8:50












$begingroup$
Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
$endgroup$
– hina10531
Mar 18 at 11:14





$begingroup$
Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
$endgroup$
– hina10531
Mar 18 at 11:14











1 Answer
1






active

oldest

votes


















0












$begingroup$

Increasing the latent dimension value was the key here.



My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.



Increasing the hidden feature count makes my system perform better, successfully predicting the X,Y,Z items as false candidates for other existing users except Andy and Andrew. Below is how to set the value. This is based on the example code in ML.NET



var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
.Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
.Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
.Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
new string[] DefaultColumnNames.Features
, (e) => e.latentDim = 200; ) // set custom value here.
)
);


To my best knowledge



When decomposing matrices, SVD extracts hidden features from the matrix of the training data and the hidden layers will be directly related to each users and a set of items in the middle of each entities, which is referred as dimension reduction here. I guess too small latent dimension generalizes the variety of recommendation items. That's why, I reckon, increasing the value solves my problem.



Any correction or comment would be appreciated. I definitely don't want to deliver false belief.






share|improve this answer









$endgroup$












    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47499%2fa-weird-result-from-a-recommender-system%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    Increasing the latent dimension value was the key here.



    My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.



    Increasing the hidden feature count makes my system perform better, successfully predicting the X,Y,Z items as false candidates for other existing users except Andy and Andrew. Below is how to set the value. This is based on the example code in ML.NET



    var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
    .Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
    .Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
    .Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
    new string[] DefaultColumnNames.Features
    , (e) => e.latentDim = 200; ) // set custom value here.
    )
    );


    To my best knowledge



    When decomposing matrices, SVD extracts hidden features from the matrix of the training data and the hidden layers will be directly related to each users and a set of items in the middle of each entities, which is referred as dimension reduction here. I guess too small latent dimension generalizes the variety of recommendation items. That's why, I reckon, increasing the value solves my problem.



    Any correction or comment would be appreciated. I definitely don't want to deliver false belief.






    share|improve this answer









    $endgroup$

















      0












      $begingroup$

      Increasing the latent dimension value was the key here.



      My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.



      Increasing the hidden feature count makes my system perform better, successfully predicting the X,Y,Z items as false candidates for other existing users except Andy and Andrew. Below is how to set the value. This is based on the example code in ML.NET



      var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
      .Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
      .Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
      .Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
      new string[] DefaultColumnNames.Features
      , (e) => e.latentDim = 200; ) // set custom value here.
      )
      );


      To my best knowledge



      When decomposing matrices, SVD extracts hidden features from the matrix of the training data and the hidden layers will be directly related to each users and a set of items in the middle of each entities, which is referred as dimension reduction here. I guess too small latent dimension generalizes the variety of recommendation items. That's why, I reckon, increasing the value solves my problem.



      Any correction or comment would be appreciated. I definitely don't want to deliver false belief.






      share|improve this answer









      $endgroup$















        0












        0








        0





        $begingroup$

        Increasing the latent dimension value was the key here.



        My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.



        Increasing the hidden feature count makes my system perform better, successfully predicting the X,Y,Z items as false candidates for other existing users except Andy and Andrew. Below is how to set the value. This is based on the example code in ML.NET



        var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
        .Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
        .Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
        .Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
        new string[] DefaultColumnNames.Features
        , (e) => e.latentDim = 200; ) // set custom value here.
        )
        );


        To my best knowledge



        When decomposing matrices, SVD extracts hidden features from the matrix of the training data and the hidden layers will be directly related to each users and a set of items in the middle of each entities, which is referred as dimension reduction here. I guess too small latent dimension generalizes the variety of recommendation items. That's why, I reckon, increasing the value solves my problem.



        Any correction or comment would be appreciated. I definitely don't want to deliver false belief.






        share|improve this answer









        $endgroup$



        Increasing the latent dimension value was the key here.



        My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.



        Increasing the hidden feature count makes my system perform better, successfully predicting the X,Y,Z items as false candidates for other existing users except Andy and Andrew. Below is how to set the value. This is based on the example code in ML.NET



        var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
        .Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
        .Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
        .Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
        new string[] DefaultColumnNames.Features
        , (e) => e.latentDim = 200; ) // set custom value here.
        )
        );


        To my best knowledge



        When decomposing matrices, SVD extracts hidden features from the matrix of the training data and the hidden layers will be directly related to each users and a set of items in the middle of each entities, which is referred as dimension reduction here. I guess too small latent dimension generalizes the variety of recommendation items. That's why, I reckon, increasing the value solves my problem.



        Any correction or comment would be appreciated. I definitely don't want to deliver false belief.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 days ago









        hina10531hina10531

        1164




        1164



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47499%2fa-weird-result-from-a-recommender-system%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

            Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

            Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High