NLP - How to detect the presence of a phrase and it's derivatives2019 Community Moderator ElectionWhat is the difference between NLP and text mining?NLP: wit.ai. How to use confidence score?How to detect product name from the bill text?Boolean classification on stringsUsing NLP to detect insurance FraudNLP - How to perform semantic analysis?How to deal with missing data for Bernoulli Naive Bayes?How to detect when the “bibliography” of a paper has began?NLP: What are some popular packages for phrase tokenization?NLP: Fuzzy Word/Phrase Match

What is the intuitive meaning of having a linear relationship between the logs of two variables?

Flow chart document symbol

How did Arya survive the stabbing?

Tiptoe or tiphoof? Adjusting words to better fit fantasy races

if() else if() VS if() else if()

How does it work when somebody invests in my business?

What does 算不上 mean in 算不上太美好的日子?

How to create a 32-bit integer from eight (8) 4-bit integers?

How can a function with a hole (removable discontinuity) equal a function with no hole?

Invade the Pyramid if you Dare

How do scammers retract money, while you can’t?

Is oxalic acid dihydrate considered a primary acid standard in analytical chemistry?

Where does the Z80 processor start executing from?

Crossing the line between justified force and brutality

Go Pregnant or Go Home

Pole-zeros of a real-valued causal FIR system

Was Spock the First Vulcan in Starfleet?

Can the discrete variable be a negative number?

Customer Requests (Sometimes) Drive Me Bonkers!

How do I rename a Linux host without needing to reboot for the rename to take effect?

What can we do to stop prior company from asking us questions?

What is the best translation for "slot" in the context of multiplayer video games?

What grammatical function is や performing here?

How to Reset Passwords on Multiple Websites Easily?



NLP - How to detect the presence of a phrase and it's derivatives



2019 Community Moderator ElectionWhat is the difference between NLP and text mining?NLP: wit.ai. How to use confidence score?How to detect product name from the bill text?Boolean classification on stringsUsing NLP to detect insurance FraudNLP - How to perform semantic analysis?How to deal with missing data for Bernoulli Naive Bayes?How to detect when the “bibliography” of a paper has began?NLP: What are some popular packages for phrase tokenization?NLP: Fuzzy Word/Phrase Match










0












$begingroup$


I have a dataset with a free form text field as one of the variables. Essentially I want to determine if a record has the phrase "The cat is not present". However, this phrase could be written as "cat is not present", "cat- not present", "There is no cat", "cat: not present", "no cat here to report", "report: no cat", and many other derivatives. I also want to exclude situations like "I was outside playing with my friend Bob. It was sunny. It was warm. Cat was not present. Overall, it was a good day" because this has "useful" context.



The end goal is to calculate the number of records that has "no cat is present" (minus instances where this phrase or derivatives has context) vs "cat is present"










share|improve this question











$endgroup$











  • $begingroup$
    This is surely an NLP problem. Your last sentence however made it slightly difficult, and make it understand context. One easy solution that comes to my mind is: 1. do a binary classification with a lot of such labeled data. 2. Another solution can be do a multi class classification. There are codes avaibable on internet. Both of these solutions are simple. 3. Third solution can be to some use some attention or semantic similarity kind of solution.
    $endgroup$
    – Sandeep B
    Mar 21 at 13:02










  • $begingroup$
    Thank you for the reply. I am still a beginner in applying these types of solutions, so it might not be so simple! That being said, I would rather not copy and paste, and tweak. I would like to learn it and write it myself. Please excuse my ignorance, but is there more formal names for these type of NLP techniques? I would like to use the method that provides the most confidence (which is likely your third option). I don't think I will need to create a test / training set because we are using the whole population for a specific period of time.
    $endgroup$
    – DataNoob7
    Mar 21 at 20:13











  • $begingroup$
    I have looked up solutions 1 and 2, but I don't think it answers the question due to the variety of which these instances can appear. Unless I am misunderstanding.
    $endgroup$
    – DataNoob7
    2 days ago















0












$begingroup$


I have a dataset with a free form text field as one of the variables. Essentially I want to determine if a record has the phrase "The cat is not present". However, this phrase could be written as "cat is not present", "cat- not present", "There is no cat", "cat: not present", "no cat here to report", "report: no cat", and many other derivatives. I also want to exclude situations like "I was outside playing with my friend Bob. It was sunny. It was warm. Cat was not present. Overall, it was a good day" because this has "useful" context.



The end goal is to calculate the number of records that has "no cat is present" (minus instances where this phrase or derivatives has context) vs "cat is present"










share|improve this question











$endgroup$











  • $begingroup$
    This is surely an NLP problem. Your last sentence however made it slightly difficult, and make it understand context. One easy solution that comes to my mind is: 1. do a binary classification with a lot of such labeled data. 2. Another solution can be do a multi class classification. There are codes avaibable on internet. Both of these solutions are simple. 3. Third solution can be to some use some attention or semantic similarity kind of solution.
    $endgroup$
    – Sandeep B
    Mar 21 at 13:02










  • $begingroup$
    Thank you for the reply. I am still a beginner in applying these types of solutions, so it might not be so simple! That being said, I would rather not copy and paste, and tweak. I would like to learn it and write it myself. Please excuse my ignorance, but is there more formal names for these type of NLP techniques? I would like to use the method that provides the most confidence (which is likely your third option). I don't think I will need to create a test / training set because we are using the whole population for a specific period of time.
    $endgroup$
    – DataNoob7
    Mar 21 at 20:13











  • $begingroup$
    I have looked up solutions 1 and 2, but I don't think it answers the question due to the variety of which these instances can appear. Unless I am misunderstanding.
    $endgroup$
    – DataNoob7
    2 days ago













0












0








0


0



$begingroup$


I have a dataset with a free form text field as one of the variables. Essentially I want to determine if a record has the phrase "The cat is not present". However, this phrase could be written as "cat is not present", "cat- not present", "There is no cat", "cat: not present", "no cat here to report", "report: no cat", and many other derivatives. I also want to exclude situations like "I was outside playing with my friend Bob. It was sunny. It was warm. Cat was not present. Overall, it was a good day" because this has "useful" context.



The end goal is to calculate the number of records that has "no cat is present" (minus instances where this phrase or derivatives has context) vs "cat is present"










share|improve this question











$endgroup$




I have a dataset with a free form text field as one of the variables. Essentially I want to determine if a record has the phrase "The cat is not present". However, this phrase could be written as "cat is not present", "cat- not present", "There is no cat", "cat: not present", "no cat here to report", "report: no cat", and many other derivatives. I also want to exclude situations like "I was outside playing with my friend Bob. It was sunny. It was warm. Cat was not present. Overall, it was a good day" because this has "useful" context.



The end goal is to calculate the number of records that has "no cat is present" (minus instances where this phrase or derivatives has context) vs "cat is present"







machine-learning python nlp






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 22 at 20:25







DataNoob7

















asked Mar 20 at 22:22









DataNoob7DataNoob7

193




193











  • $begingroup$
    This is surely an NLP problem. Your last sentence however made it slightly difficult, and make it understand context. One easy solution that comes to my mind is: 1. do a binary classification with a lot of such labeled data. 2. Another solution can be do a multi class classification. There are codes avaibable on internet. Both of these solutions are simple. 3. Third solution can be to some use some attention or semantic similarity kind of solution.
    $endgroup$
    – Sandeep B
    Mar 21 at 13:02










  • $begingroup$
    Thank you for the reply. I am still a beginner in applying these types of solutions, so it might not be so simple! That being said, I would rather not copy and paste, and tweak. I would like to learn it and write it myself. Please excuse my ignorance, but is there more formal names for these type of NLP techniques? I would like to use the method that provides the most confidence (which is likely your third option). I don't think I will need to create a test / training set because we are using the whole population for a specific period of time.
    $endgroup$
    – DataNoob7
    Mar 21 at 20:13











  • $begingroup$
    I have looked up solutions 1 and 2, but I don't think it answers the question due to the variety of which these instances can appear. Unless I am misunderstanding.
    $endgroup$
    – DataNoob7
    2 days ago
















  • $begingroup$
    This is surely an NLP problem. Your last sentence however made it slightly difficult, and make it understand context. One easy solution that comes to my mind is: 1. do a binary classification with a lot of such labeled data. 2. Another solution can be do a multi class classification. There are codes avaibable on internet. Both of these solutions are simple. 3. Third solution can be to some use some attention or semantic similarity kind of solution.
    $endgroup$
    – Sandeep B
    Mar 21 at 13:02










  • $begingroup$
    Thank you for the reply. I am still a beginner in applying these types of solutions, so it might not be so simple! That being said, I would rather not copy and paste, and tweak. I would like to learn it and write it myself. Please excuse my ignorance, but is there more formal names for these type of NLP techniques? I would like to use the method that provides the most confidence (which is likely your third option). I don't think I will need to create a test / training set because we are using the whole population for a specific period of time.
    $endgroup$
    – DataNoob7
    Mar 21 at 20:13











  • $begingroup$
    I have looked up solutions 1 and 2, but I don't think it answers the question due to the variety of which these instances can appear. Unless I am misunderstanding.
    $endgroup$
    – DataNoob7
    2 days ago















$begingroup$
This is surely an NLP problem. Your last sentence however made it slightly difficult, and make it understand context. One easy solution that comes to my mind is: 1. do a binary classification with a lot of such labeled data. 2. Another solution can be do a multi class classification. There are codes avaibable on internet. Both of these solutions are simple. 3. Third solution can be to some use some attention or semantic similarity kind of solution.
$endgroup$
– Sandeep B
Mar 21 at 13:02




$begingroup$
This is surely an NLP problem. Your last sentence however made it slightly difficult, and make it understand context. One easy solution that comes to my mind is: 1. do a binary classification with a lot of such labeled data. 2. Another solution can be do a multi class classification. There are codes avaibable on internet. Both of these solutions are simple. 3. Third solution can be to some use some attention or semantic similarity kind of solution.
$endgroup$
– Sandeep B
Mar 21 at 13:02












$begingroup$
Thank you for the reply. I am still a beginner in applying these types of solutions, so it might not be so simple! That being said, I would rather not copy and paste, and tweak. I would like to learn it and write it myself. Please excuse my ignorance, but is there more formal names for these type of NLP techniques? I would like to use the method that provides the most confidence (which is likely your third option). I don't think I will need to create a test / training set because we are using the whole population for a specific period of time.
$endgroup$
– DataNoob7
Mar 21 at 20:13





$begingroup$
Thank you for the reply. I am still a beginner in applying these types of solutions, so it might not be so simple! That being said, I would rather not copy and paste, and tweak. I would like to learn it and write it myself. Please excuse my ignorance, but is there more formal names for these type of NLP techniques? I would like to use the method that provides the most confidence (which is likely your third option). I don't think I will need to create a test / training set because we are using the whole population for a specific period of time.
$endgroup$
– DataNoob7
Mar 21 at 20:13













$begingroup$
I have looked up solutions 1 and 2, but I don't think it answers the question due to the variety of which these instances can appear. Unless I am misunderstanding.
$endgroup$
– DataNoob7
2 days ago




$begingroup$
I have looked up solutions 1 and 2, but I don't think it answers the question due to the variety of which these instances can appear. Unless I am misunderstanding.
$endgroup$
– DataNoob7
2 days ago










0






active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47705%2fnlp-how-to-detect-the-presence-of-a-phrase-and-its-derivatives%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47705%2fnlp-how-to-detect-the-presence-of-a-phrase-and-its-derivatives%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High