A weird result from a recommender systemMulticlass Classification with large number of categoriesTaxonomy of recommender system methodologieshow to evaluate top n recommendation system with movie lens dataset?Recommendation matrix as a product of User Similarity and RatingsRecommender system based on purchase history, not ratingsTensor Decomposition for Higher-Order Context-Aware Recommender SystemsCalculate similarity on boolean dataIs there any standard pattern recognition algorithm in predicting an item which a user will be buying next, given I have the history of the purchasesHow to create user and item profile in an item to item collaborative filtering? (Non-rating case)recommender systems : how to deal with items that change over time?

What to do when eye contact makes your coworker uncomfortable?

Why can't the Brexit deadlock in the UK parliament be solved with a plurality vote?

Can you use Vicious Mockery to win an argument or gain favours?

Will number of steps recorded on FitBit/any fitness tracker add up distance in PokemonGo?

Confused about Cramer-Rao lower bound and CLT

What is the English pronunciation of "pain au chocolat"?

Does Doodling or Improvising on the Piano Have Any Benefits?

Biological Blimps: Propulsion

Pre-mixing cryogenic fuels and using only one fuel tank

Circuit Analysis: Obtaining Close Loop OP - AMP Transfer function

Can I say "fingers" when referring to toes?

Why Shazam when there is already Superman?

How could a planet have erratic days?

Why is so much work done on numerical verification of the Riemann Hypothesis?

Will the Sticky MAC access policy prevent unauthorized hubs from connecting to a network?

How to explain what's wrong with this application of the chain rule?

What (the heck) is a Super Worm Equinox Moon?

Why should universal income be universal?

What is Cash Advance APR?

What does "Scientists rise up against statistical significance" mean? (Comment in Nature)

Temporarily disable WLAN internet access for children, but allow it for adults

Which was the first story featuring espers?

What's the name of the logical fallacy where a debater extends a statement far beyond the original statement to make it true?

How much theory knowledge is actually used while playing?

A weird result from a recommender system

Multiclass Classification with large number of categoriesTaxonomy of recommender system methodologieshow to evaluate top n recommendation system with movie lens dataset?Recommendation matrix as a product of User Similarity and RatingsRecommender system based on purchase history, not ratingsTensor Decomposition for Higher-Order Context-Aware Recommender SystemsCalculate similarity on boolean dataIs there any standard pattern recognition algorithm in predicting an item which a user will be buying next, given I have the history of the purchasesHow to create user and item profile in an item to item collaborative filtering? (Non-rating case)recommender systems : how to deal with items that change over time?

Say there're the top 10 most popular items among 100 sales products and about 100k users regularly purchase items on daily basis.

A = has been purchased by 100k users. 
B = has been purchased by 30k users.
C = has been purchased by 20k users.
D = has been purchased by 18k users.
E = has been purchased by 10k users.
F = has been purchased by 8k users.
G = has been purchased by 7k users.
H = has been purchased by 4k users.
I = has been purchased by 3k users.
J = has been purchased by 1k users.

X = never bought by anyone.
Y = never bought by anyone.
Z = never bought by anyone.

So basing on this fact, the training data is going to have more than 50m rows of data like this.

User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.

The preferences of users will be like this, they might be countless but let's take one example.

Thomas bought A, B, C, D
Angelica bought A, B, C, D
Gloria bought A, B, C, D
Jennifer bought A, B, C, D and I

Using the user based collaborative filtering, it is quite obvious that Thomas, Angelica, Gloria are likely to get the item I as a recommended item because Jennifer likes I item and also has the exact same purchase pattern as the others do.

With this in mind, I was starting to think that if I have another two users who bought the unpopular items X,Y,Z, the predictions on them will result in recommending the unsold items.

So I added dummy data manually before training the model like this.

User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.
100001 Andrew 24 X true 19, Mar 2019
100001 Andrew 25 Y true 19, Mar 2019
100002 Andy 24 X true 19, Mar 2019
100002 Andy 25 Y true 19, Mar 2019
100002 Andy 26 Z true 19, Mar 2019

As I mentioned above, I thought Andrew will get Z as a recommended item because Andrew has a common in the item preference with Andy and he bought Z as well, even if the purchase data for X,Y and Z has a extremely small portion of training data ( only 5 records exist among the 10M records of data ).

But the result was totally unexpected.

Every users have X, Y, Z in their recommended list, although the prediction score is very low compared to the others and what's more puzzling is that Andrew and Andy have no outstanding scores on the unpopular items even if they actually bought them!

I don't know why this happens, do I misunderstand the user based collaborative filtering concept?

edited Mar 18 at 7:30

asked Mar 18 at 7:25

hina10531

1164

1

$begingroup$
How did you implement the recommendation system? Can you share some code or something?
$endgroup$
– yoav_aaa
Mar 18 at 8:50

$begingroup$
Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
$endgroup$
– hina10531
Mar 18 at 11:14

add a comment |

Say there're the top 10 most popular items among 100 sales products and about 100k users regularly purchase items on daily basis.

A = has been purchased by 100k users. 
B = has been purchased by 30k users.
C = has been purchased by 20k users.
D = has been purchased by 18k users.
E = has been purchased by 10k users.
F = has been purchased by 8k users.
G = has been purchased by 7k users.
H = has been purchased by 4k users.
I = has been purchased by 3k users.
J = has been purchased by 1k users.

X = never bought by anyone.
Y = never bought by anyone.
Z = never bought by anyone.

So basing on this fact, the training data is going to have more than 50m rows of data like this.

User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.

The preferences of users will be like this, they might be countless but let's take one example.

Thomas bought A, B, C, D
Angelica bought A, B, C, D
Gloria bought A, B, C, D
Jennifer bought A, B, C, D and I

With this in mind, I was starting to think that if I have another two users who bought the unpopular items X,Y,Z, the predictions on them will result in recommending the unsold items.

So I added dummy data manually before training the model like this.

User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.
100001 Andrew 24 X true 19, Mar 2019
100001 Andrew 25 Y true 19, Mar 2019
100002 Andy 24 X true 19, Mar 2019
100002 Andy 25 Y true 19, Mar 2019
100002 Andy 26 Z true 19, Mar 2019

But the result was totally unexpected.

I don't know why this happens, do I misunderstand the user based collaborative filtering concept?

edited Mar 18 at 7:30

asked Mar 18 at 7:25

hina10531

1164

1

$begingroup$
How did you implement the recommendation system? Can you share some code or something?
$endgroup$
– yoav_aaa
Mar 18 at 8:50

$begingroup$
Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
$endgroup$
– hina10531
Mar 18 at 11:14

add a comment |

Say there're the top 10 most popular items among 100 sales products and about 100k users regularly purchase items on daily basis.

A = has been purchased by 100k users. 
B = has been purchased by 30k users.
C = has been purchased by 20k users.
D = has been purchased by 18k users.
E = has been purchased by 10k users.
F = has been purchased by 8k users.
G = has been purchased by 7k users.
H = has been purchased by 4k users.
I = has been purchased by 3k users.
J = has been purchased by 1k users.

X = never bought by anyone.
Y = never bought by anyone.
Z = never bought by anyone.

So basing on this fact, the training data is going to have more than 50m rows of data like this.

User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.

The preferences of users will be like this, they might be countless but let's take one example.

Thomas bought A, B, C, D
Angelica bought A, B, C, D
Gloria bought A, B, C, D
Jennifer bought A, B, C, D and I

With this in mind, I was starting to think that if I have another two users who bought the unpopular items X,Y,Z, the predictions on them will result in recommending the unsold items.

So I added dummy data manually before training the model like this.

User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.
100001 Andrew 24 X true 19, Mar 2019
100001 Andrew 25 Y true 19, Mar 2019
100002 Andy 24 X true 19, Mar 2019
100002 Andy 25 Y true 19, Mar 2019
100002 Andy 26 Z true 19, Mar 2019

But the result was totally unexpected.

I don't know why this happens, do I misunderstand the user based collaborative filtering concept?

edited Mar 18 at 7:30

asked Mar 18 at 7:25

hina10531

1164

Say there're the top 10 most popular items among 100 sales products and about 100k users regularly purchase items on daily basis.

A = has been purchased by 100k users. 
B = has been purchased by 30k users.
C = has been purchased by 20k users.
D = has been purchased by 18k users.
E = has been purchased by 10k users.
F = has been purchased by 8k users.
G = has been purchased by 7k users.
H = has been purchased by 4k users.
I = has been purchased by 3k users.
J = has been purchased by 1k users.

X = never bought by anyone.
Y = never bought by anyone.
Z = never bought by anyone.

So basing on this fact, the training data is going to have more than 50m rows of data like this.

User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.

The preferences of users will be like this, they might be countless but let's take one example.

Thomas bought A, B, C, D
Angelica bought A, B, C, D
Gloria bought A, B, C, D
Jennifer bought A, B, C, D and I

With this in mind, I was starting to think that if I have another two users who bought the unpopular items X,Y,Z, the predictions on them will result in recommending the unsold items.

So I added dummy data manually before training the model like this.

User Id | User Name | Item Id | Item Name | label | Purchase Date |
1 Thomas 1 A true 12, Mar 2019
1 Thomas 1 A true 13, Mar 2019
1 Thomas 1 A true 14, Mar 2019
1 Thomas 1 A true 15, Mar 2019
1 Thomas 2 B true 11, Mar 2019
1 Thomas 3 C true 09, Mar 2019
1 Thomas 4 D true 07, Mar 2019
2 Angelica 1 E true 12, Mar 2019
.
.
.
100001 Andrew 24 X true 19, Mar 2019
100001 Andrew 25 Y true 19, Mar 2019
100002 Andy 24 X true 19, Mar 2019
100002 Andy 25 Y true 19, Mar 2019
100002 Andy 26 Z true 19, Mar 2019

But the result was totally unexpected.

I don't know why this happens, do I misunderstand the user based collaborative filtering concept?

machine-learning recommender-system

edited Mar 18 at 7:30

asked Mar 18 at 7:25

hina10531

1164

edited Mar 18 at 7:30

asked Mar 18 at 7:25

hina10531

1164

edited Mar 18 at 7:30

asked Mar 18 at 7:25

hina10531

1164

asked Mar 18 at 7:25

hina10531

1164

asked Mar 18 at 7:25

hina10531

1164

1

$begingroup$
How did you implement the recommendation system? Can you share some code or something?
$endgroup$
– yoav_aaa
Mar 18 at 8:50

$begingroup$
Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
$endgroup$
– hina10531
Mar 18 at 11:14

add a comment |

1

$begingroup$
How did you implement the recommendation system? Can you share some code or something?
$endgroup$
– yoav_aaa
Mar 18 at 8:50

$begingroup$
Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github
$endgroup$
– hina10531
Mar 18 at 11:14

How did you implement the recommendation system? Can you share some code or something?

– yoav_aaa
Mar 18 at 8:50

Implemented it through ML.NET using Fieldaware Factorization Machine. Basically same as this code in the official github

– hina10531
Mar 18 at 11:14

add a comment |

1 Answer
1

active

oldest

votes

Increasing the latent dimension value was the key here.

My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.

Increasing the hidden feature count makes my system perform better, successfully predicting the X,Y,Z items as false candidates for other existing users except Andy and Andrew. Below is how to set the value. This is based on the example code in ML.NET

var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
 .Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
 .Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
 .Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
 new string[] DefaultColumnNames.Features
 , (e) => e.latentDim = 200; ) // set custom value here.
 )
 );

To my best knowledge

When decomposing matrices, SVD extracts hidden features from the matrix of the training data and the hidden layers will be directly related to each users and a set of items in the middle of each entities, which is referred as dimension reduction here. I guess too small latent dimension generalizes the variety of recommendation items. That's why, I reckon, increasing the value solves my problem.

Any correction or comment would be appreciated. I definitely don't want to deliver false belief.

answered 2 days ago

hina10531

1164

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47499%2fa-weird-result-from-a-recommender-system%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Increasing the latent dimension value was the key here.

My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.

var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
 .Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
 .Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
 .Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
 new string[] DefaultColumnNames.Features
 , (e) => e.latentDim = 200; ) // set custom value here.
 )
 );

To my best knowledge

Any correction or comment would be appreciated. I definitely don't want to deliver false belief.

answered 2 days ago

hina10531

1164

add a comment |

Increasing the latent dimension value was the key here.

My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.

var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
 .Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
 .Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
 .Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
 new string[] DefaultColumnNames.Features
 , (e) => e.latentDim = 200; ) // set custom value here.
 )
 );

To my best knowledge

Any correction or comment would be appreciated. I definitely don't want to deliver false belief.

answered 2 days ago

hina10531

1164

add a comment |

Increasing the latent dimension value was the key here.

My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.

var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
 .Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
 .Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
 .Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
 new string[] DefaultColumnNames.Features
 , (e) => e.latentDim = 200; ) // set custom value here.
 )
 );

To my best knowledge

Any correction or comment would be appreciated. I definitely don't want to deliver false belief.

answered 2 days ago

hina10531

1164

Increasing the latent dimension value was the key here.

My recommendation system was implemented via ML.NET. And the framework's default setting for the latent dimension value was 20, which seems pretty small considering the volume of my training data.

var pipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "userIdFeaturized", inputColumnName: nameof(MovieRating.userId))
 .Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "movieIdFeaturized", inputColumnName: nameof(MovieRating.movieId))
 .Append(mlContext.Transforms.Concatenate(DefaultColumnNames.Features, "userIdFeaturized", "movieIdFeaturized"))
 .Append(mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(
 new string[] DefaultColumnNames.Features
 , (e) => e.latentDim = 200; ) // set custom value here.
 )
 );

To my best knowledge

Any correction or comment would be appreciated. I definitely don't want to deliver false belief.

answered 2 days ago

hina10531

1164

answered 2 days ago

hina10531

1164

answered 2 days ago

hina10531

1164

answered 2 days ago

hina10531

1164

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

S3N7AvFoWW7Ue4HID BU kU97Ty2

搜尋此網誌

Trjtdtk

1 Answer
1

Increasing the latent dimension value was the key here.

To my best knowledge

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Increasing the latent dimension value was the key here.

To my best knowledge

Increasing the latent dimension value was the key here.

To my best knowledge

Increasing the latent dimension value was the key here.

To my best knowledge

Increasing the latent dimension value was the key here.

To my best knowledge

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer 1

Increasing the latent dimension value was the key here.

To my best knowledge

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Increasing the latent dimension value was the key here.

To my best knowledge

Increasing the latent dimension value was the key here.

To my best knowledge

Increasing the latent dimension value was the key here.

To my best knowledge

Increasing the latent dimension value was the key here.

To my best knowledge

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer
1

1 Answer
1

1 Answer
1