How exactly do I extract the important features from strings for machine learning?How to choose the features for a neural network?Machine learning - features engineering from date/time dataHow to select features from text data?Identifying important interactions between features using machine learningFilling missing values for important featuresThe automatic construction of new features from raw dataHow to extract relative importance of features from a tensorflow DNNRegressor model?Using python and machine learning to extract information from an invoice? Inital dataset?Extract features from a surveyHow exactly do I go about extracting features from timestamps for machine learning?
Could the E-bike drivetrain wear down till needing replacement after 400 km?
Can somebody explain the brexit thing in one or two child-proof sentences?
Customize circled numbers
Varistor? Purpose and principle
Can I use Seifert-van Kampen theorem infinite times
Why should universal income be universal?
Pre-mixing cryogenic fuels and using only one fuel tank
How can Trident be so inexpensive? Will it orbit Triton or just do a (slow) flyby?
What spells are affected by the size of the caster?
What linear sensor for a keybaord?
Is there a single word describing earning money through any means?
Store Credit Card Information in Password Manager?
why `nmap 192.168.1.97` returns less services than `nmap 127.0.0.1`?
hline - width of entire table
Did arcade monitors have same pixel aspect ratio as TV sets?
Can I sign legal documents with a smiley face?
Does an advisor owe his/her student anything? Will an advisor keep a PhD student only out of pity?
Argument list too long when zipping large list of certain files in a folder
How to explain what's wrong with this application of the chain rule?
Is there an efficient solution to the travelling salesman problem with binary edge weights?
Biological Blimps: Propulsion
Is it possible to have a strip of cold climate in the middle of a planet?
Loading commands from file
Should I stop contributing to retirement accounts?
How exactly do I extract the important features from strings for machine learning?
How to choose the features for a neural network?Machine learning - features engineering from date/time dataHow to select features from text data?Identifying important interactions between features using machine learningFilling missing values for important featuresThe automatic construction of new features from raw dataHow to extract relative importance of features from a tensorflow DNNRegressor model?Using python and machine learning to extract information from an invoice? Inital dataset?Extract features from a surveyHow exactly do I go about extracting features from timestamps for machine learning?
$begingroup$
Forgive me for my ignorance. Linked below is an image of my dataset with 1000 tuples.
https://i.stack.imgur.com/WHIlx.png
I have the following questions
(1) How exactly do I go about extracting information from the Ad topic line?
(2) How should I approach the categorical variables Country and City? ( I've heard about encoding but won't the large number of countries/cities be a problem?)
Thank you so much for your time. All help is greatly appreciated.
machine-learning feature-selection feature-extraction feature-engineering
New contributor
$endgroup$
add a comment |
$begingroup$
Forgive me for my ignorance. Linked below is an image of my dataset with 1000 tuples.
https://i.stack.imgur.com/WHIlx.png
I have the following questions
(1) How exactly do I go about extracting information from the Ad topic line?
(2) How should I approach the categorical variables Country and City? ( I've heard about encoding but won't the large number of countries/cities be a problem?)
Thank you so much for your time. All help is greatly appreciated.
machine-learning feature-selection feature-extraction feature-engineering
New contributor
$endgroup$
add a comment |
$begingroup$
Forgive me for my ignorance. Linked below is an image of my dataset with 1000 tuples.
https://i.stack.imgur.com/WHIlx.png
I have the following questions
(1) How exactly do I go about extracting information from the Ad topic line?
(2) How should I approach the categorical variables Country and City? ( I've heard about encoding but won't the large number of countries/cities be a problem?)
Thank you so much for your time. All help is greatly appreciated.
machine-learning feature-selection feature-extraction feature-engineering
New contributor
$endgroup$
Forgive me for my ignorance. Linked below is an image of my dataset with 1000 tuples.
https://i.stack.imgur.com/WHIlx.png
I have the following questions
(1) How exactly do I go about extracting information from the Ad topic line?
(2) How should I approach the categorical variables Country and City? ( I've heard about encoding but won't the large number of countries/cities be a problem?)
Thank you so much for your time. All help is greatly appreciated.
machine-learning feature-selection feature-extraction feature-engineering
machine-learning feature-selection feature-extraction feature-engineering
New contributor
New contributor
edited Mar 20 at 6:17
Apollo
New contributor
asked Mar 19 at 20:29
ApolloApollo
61
61
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
(1) How exactly do I go about extracting information from the Ad topic line?
The best way to deal with the test string is to Convert it into a term-document matrix and calculate the TFIDF Scores for every word in the string.
(2) How should I approach the categorical variables Country and City? ( I've heard about encoding but won't a large number of countries/cities be a problem?)
Yes you said the right large number of cities will be a problem i suggest still you can go with that, Rather there is a way find out the distribution of all your cities and cities and convert them into integers.
Even another way could be give weights to the cities which are repeated most,let me give you an example:
Suppose you have 10 observations and 5 cities A(5 times),B(2 times),C(once),D(Once),E(Once):
so A will get a weight of 5/10=0.5 which B will get a weight of 2/10=0.2 and so on.
Hope this helps!
$endgroup$
$begingroup$
Thank you for your response! Could you please walk me through the matrix conversion and score calculation in python?
$endgroup$
– Apollo
22 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Apollo is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47640%2fhow-exactly-do-i-extract-the-important-features-from-strings-for-machine-learnin%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
(1) How exactly do I go about extracting information from the Ad topic line?
The best way to deal with the test string is to Convert it into a term-document matrix and calculate the TFIDF Scores for every word in the string.
(2) How should I approach the categorical variables Country and City? ( I've heard about encoding but won't a large number of countries/cities be a problem?)
Yes you said the right large number of cities will be a problem i suggest still you can go with that, Rather there is a way find out the distribution of all your cities and cities and convert them into integers.
Even another way could be give weights to the cities which are repeated most,let me give you an example:
Suppose you have 10 observations and 5 cities A(5 times),B(2 times),C(once),D(Once),E(Once):
so A will get a weight of 5/10=0.5 which B will get a weight of 2/10=0.2 and so on.
Hope this helps!
$endgroup$
$begingroup$
Thank you for your response! Could you please walk me through the matrix conversion and score calculation in python?
$endgroup$
– Apollo
22 hours ago
add a comment |
$begingroup$
(1) How exactly do I go about extracting information from the Ad topic line?
The best way to deal with the test string is to Convert it into a term-document matrix and calculate the TFIDF Scores for every word in the string.
(2) How should I approach the categorical variables Country and City? ( I've heard about encoding but won't a large number of countries/cities be a problem?)
Yes you said the right large number of cities will be a problem i suggest still you can go with that, Rather there is a way find out the distribution of all your cities and cities and convert them into integers.
Even another way could be give weights to the cities which are repeated most,let me give you an example:
Suppose you have 10 observations and 5 cities A(5 times),B(2 times),C(once),D(Once),E(Once):
so A will get a weight of 5/10=0.5 which B will get a weight of 2/10=0.2 and so on.
Hope this helps!
$endgroup$
$begingroup$
Thank you for your response! Could you please walk me through the matrix conversion and score calculation in python?
$endgroup$
– Apollo
22 hours ago
add a comment |
$begingroup$
(1) How exactly do I go about extracting information from the Ad topic line?
The best way to deal with the test string is to Convert it into a term-document matrix and calculate the TFIDF Scores for every word in the string.
(2) How should I approach the categorical variables Country and City? ( I've heard about encoding but won't a large number of countries/cities be a problem?)
Yes you said the right large number of cities will be a problem i suggest still you can go with that, Rather there is a way find out the distribution of all your cities and cities and convert them into integers.
Even another way could be give weights to the cities which are repeated most,let me give you an example:
Suppose you have 10 observations and 5 cities A(5 times),B(2 times),C(once),D(Once),E(Once):
so A will get a weight of 5/10=0.5 which B will get a weight of 2/10=0.2 and so on.
Hope this helps!
$endgroup$
(1) How exactly do I go about extracting information from the Ad topic line?
The best way to deal with the test string is to Convert it into a term-document matrix and calculate the TFIDF Scores for every word in the string.
(2) How should I approach the categorical variables Country and City? ( I've heard about encoding but won't a large number of countries/cities be a problem?)
Yes you said the right large number of cities will be a problem i suggest still you can go with that, Rather there is a way find out the distribution of all your cities and cities and convert them into integers.
Even another way could be give weights to the cities which are repeated most,let me give you an example:
Suppose you have 10 observations and 5 cities A(5 times),B(2 times),C(once),D(Once),E(Once):
so A will get a weight of 5/10=0.5 which B will get a weight of 2/10=0.2 and so on.
Hope this helps!
answered Mar 20 at 11:05
Kartik PatnaikKartik Patnaik
112
112
$begingroup$
Thank you for your response! Could you please walk me through the matrix conversion and score calculation in python?
$endgroup$
– Apollo
22 hours ago
add a comment |
$begingroup$
Thank you for your response! Could you please walk me through the matrix conversion and score calculation in python?
$endgroup$
– Apollo
22 hours ago
$begingroup$
Thank you for your response! Could you please walk me through the matrix conversion and score calculation in python?
$endgroup$
– Apollo
22 hours ago
$begingroup$
Thank you for your response! Could you please walk me through the matrix conversion and score calculation in python?
$endgroup$
– Apollo
22 hours ago
add a comment |
Apollo is a new contributor. Be nice, and check out our Code of Conduct.
Apollo is a new contributor. Be nice, and check out our Code of Conduct.
Apollo is a new contributor. Be nice, and check out our Code of Conduct.
Apollo is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47640%2fhow-exactly-do-i-extract-the-important-features-from-strings-for-machine-learnin%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown