Implementation of NLP to categorize text into two categories The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 11:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhich classification algorithms to try for classifying text data into 300 categoriesTrain a classifier for a game with feedback on chosen move instead of true labelsAlgorithm for classification of words into given categoriesCategorize observations with inconsistent text descriptionsNLP grouping word categoriesCommon deep learning practices in NLP for text classificationNLP - Researches about data oriented text generationWord classification (not text classification) using NLPprepare email text for nlp (sentiment analysis)Training NLP with multiple text input features

Why doesn't a hydraulic lever violate conservation of energy?

how can a perfect fourth interval be considered either consonant or dissonant?

Why can't devices on different VLANs, but on the same subnet, communicate?

Do warforged have souls?

Button changing its text & action. Good or terrible?

Did the UK government pay "millions and millions of dollars" to try to snag Julian Assange?

Ubuntu Err :18 http://dl.google.com/linux/chrome/deb stable Release.gpg KEYEXPIRED 1555048520

Does Parliament need to approve the new Brexit delay to 31 October 2019?

Identify 80s or 90s comics with ripped creatures (not dwarves)

Drawing vertical/oblique lines in Metrical tree (tikz-qtree, tipa)

What was the last x86 CPU that did not have the x87 floating-point unit built in?

Humiliated in front of employees

Can the Right Ascension and Argument of Perigee of a spacecraft's orbit keep varying by themselves with time?

What do I do when my TA workload is more than expected?

Word for: a synonym with a positive connotation?

Match Roman Numerals

Circular reasoning in L'Hopital's rule

Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?

Why can't wing-mounted spoilers be used to steepen approaches?

Homework question about an engine pulling a train

Variable with quotation marks "$()"

How do spell lists change if the party levels up without taking a long rest?

Word to describe a time interval

Simulating Exploding Dice



Implementation of NLP to categorize text into two categories



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 11:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhich classification algorithms to try for classifying text data into 300 categoriesTrain a classifier for a game with feedback on chosen move instead of true labelsAlgorithm for classification of words into given categoriesCategorize observations with inconsistent text descriptionsNLP grouping word categoriesCommon deep learning practices in NLP for text classificationNLP - Researches about data oriented text generationWord classification (not text classification) using NLPprepare email text for nlp (sentiment analysis)Training NLP with multiple text input features










1












$begingroup$


I can't discuss my actual dataset, so please bear with me.



Let's say I have a dataset that contains a population of 20,000 examinations by a school principal. The principal is to record their examinations of student misconduct incidents. I want to implement NLP that assess the quality very broadly into two categories: "good examination" or "bad examination" of the full population.



An example of "bad examinations" are:"examination results - negative" or "exam results: negative". Or "check student's bags, checked the person. Nothing suspicious found. Or examination results negative". Or "Examination results positive". Or "ABC examined, results negative". ABC could be an abbreviation of the person's name.



A good examination would be where there is a lot of context: "Checked the student's bag and found textbooks, pencils, erasers, binders. No hidden compartments found. Interviewed the student and asked "x", "y", "z" questions. Her story corroborated other reports. Student presented herself in a clam manner. Examination results negative". Other times it could be paragraphs and paragraphs, and at the end "examination negative" or "examination positive"



There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?



How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?










share|improve this question











$endgroup$











  • $begingroup$
    Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
    $endgroup$
    – DataNoob7
    Apr 2 at 1:13










  • $begingroup$
    Is anyone able to provide insight?
    $endgroup$
    – DataNoob7
    yesterday















1












$begingroup$


I can't discuss my actual dataset, so please bear with me.



Let's say I have a dataset that contains a population of 20,000 examinations by a school principal. The principal is to record their examinations of student misconduct incidents. I want to implement NLP that assess the quality very broadly into two categories: "good examination" or "bad examination" of the full population.



An example of "bad examinations" are:"examination results - negative" or "exam results: negative". Or "check student's bags, checked the person. Nothing suspicious found. Or examination results negative". Or "Examination results positive". Or "ABC examined, results negative". ABC could be an abbreviation of the person's name.



A good examination would be where there is a lot of context: "Checked the student's bag and found textbooks, pencils, erasers, binders. No hidden compartments found. Interviewed the student and asked "x", "y", "z" questions. Her story corroborated other reports. Student presented herself in a clam manner. Examination results negative". Other times it could be paragraphs and paragraphs, and at the end "examination negative" or "examination positive"



There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?



How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?










share|improve this question











$endgroup$











  • $begingroup$
    Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
    $endgroup$
    – DataNoob7
    Apr 2 at 1:13










  • $begingroup$
    Is anyone able to provide insight?
    $endgroup$
    – DataNoob7
    yesterday













1












1








1


0



$begingroup$


I can't discuss my actual dataset, so please bear with me.



Let's say I have a dataset that contains a population of 20,000 examinations by a school principal. The principal is to record their examinations of student misconduct incidents. I want to implement NLP that assess the quality very broadly into two categories: "good examination" or "bad examination" of the full population.



An example of "bad examinations" are:"examination results - negative" or "exam results: negative". Or "check student's bags, checked the person. Nothing suspicious found. Or examination results negative". Or "Examination results positive". Or "ABC examined, results negative". ABC could be an abbreviation of the person's name.



A good examination would be where there is a lot of context: "Checked the student's bag and found textbooks, pencils, erasers, binders. No hidden compartments found. Interviewed the student and asked "x", "y", "z" questions. Her story corroborated other reports. Student presented herself in a clam manner. Examination results negative". Other times it could be paragraphs and paragraphs, and at the end "examination negative" or "examination positive"



There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?



How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?










share|improve this question











$endgroup$




I can't discuss my actual dataset, so please bear with me.



Let's say I have a dataset that contains a population of 20,000 examinations by a school principal. The principal is to record their examinations of student misconduct incidents. I want to implement NLP that assess the quality very broadly into two categories: "good examination" or "bad examination" of the full population.



An example of "bad examinations" are:"examination results - negative" or "exam results: negative". Or "check student's bags, checked the person. Nothing suspicious found. Or examination results negative". Or "Examination results positive". Or "ABC examined, results negative". ABC could be an abbreviation of the person's name.



A good examination would be where there is a lot of context: "Checked the student's bag and found textbooks, pencils, erasers, binders. No hidden compartments found. Interviewed the student and asked "x", "y", "z" questions. Her story corroborated other reports. Student presented herself in a clam manner. Examination results negative". Other times it could be paragraphs and paragraphs, and at the end "examination negative" or "examination positive"



There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?



How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?







machine-learning python nlp natural-language-process






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 2 at 1:14







DataNoob7

















asked Mar 30 at 18:25









DataNoob7DataNoob7

243




243











  • $begingroup$
    Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
    $endgroup$
    – DataNoob7
    Apr 2 at 1:13










  • $begingroup$
    Is anyone able to provide insight?
    $endgroup$
    – DataNoob7
    yesterday
















  • $begingroup$
    Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
    $endgroup$
    – DataNoob7
    Apr 2 at 1:13










  • $begingroup$
    Is anyone able to provide insight?
    $endgroup$
    – DataNoob7
    yesterday















$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13




$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13












$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday




$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday










0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48271%2fimplementation-of-nlp-to-categorize-text-into-two-categories%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48271%2fimplementation-of-nlp-to-categorize-text-into-two-categories%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High