Implementation of NLP to categorize text into two categories The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 11:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhich classification algorithms to try for classifying text data into 300 categoriesTrain a classifier for a game with feedback on chosen move instead of true labelsAlgorithm for classification of words into given categoriesCategorize observations with inconsistent text descriptionsNLP grouping word categoriesCommon deep learning practices in NLP for text classificationNLP - Researches about data oriented text generationWord classification (not text classification) using NLPprepare email text for nlp (sentiment analysis)Training NLP with multiple text input features
Why doesn't a hydraulic lever violate conservation of energy?
how can a perfect fourth interval be considered either consonant or dissonant?
Why can't devices on different VLANs, but on the same subnet, communicate?
Do warforged have souls?
Button changing its text & action. Good or terrible?
Did the UK government pay "millions and millions of dollars" to try to snag Julian Assange?
Ubuntu Err :18 http://dl.google.com/linux/chrome/deb stable Release.gpg KEYEXPIRED 1555048520
Does Parliament need to approve the new Brexit delay to 31 October 2019?
Identify 80s or 90s comics with ripped creatures (not dwarves)
Drawing vertical/oblique lines in Metrical tree (tikz-qtree, tipa)
What was the last x86 CPU that did not have the x87 floating-point unit built in?
Humiliated in front of employees
Can the Right Ascension and Argument of Perigee of a spacecraft's orbit keep varying by themselves with time?
What do I do when my TA workload is more than expected?
Word for: a synonym with a positive connotation?
Match Roman Numerals
Circular reasoning in L'Hopital's rule
Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?
Why can't wing-mounted spoilers be used to steepen approaches?
Homework question about an engine pulling a train
Variable with quotation marks "$()"
How do spell lists change if the party levels up without taking a long rest?
Word to describe a time interval
Simulating Exploding Dice
Implementation of NLP to categorize text into two categories
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 11:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhich classification algorithms to try for classifying text data into 300 categoriesTrain a classifier for a game with feedback on chosen move instead of true labelsAlgorithm for classification of words into given categoriesCategorize observations with inconsistent text descriptionsNLP grouping word categoriesCommon deep learning practices in NLP for text classificationNLP - Researches about data oriented text generationWord classification (not text classification) using NLPprepare email text for nlp (sentiment analysis)Training NLP with multiple text input features
$begingroup$
I can't discuss my actual dataset, so please bear with me.
Let's say I have a dataset that contains a population of 20,000 examinations by a school principal. The principal is to record their examinations of student misconduct incidents. I want to implement NLP that assess the quality very broadly into two categories: "good examination" or "bad examination" of the full population.
An example of "bad examinations" are:"examination results - negative" or "exam results: negative". Or "check student's bags, checked the person. Nothing suspicious found. Or examination results negative". Or "Examination results positive". Or "ABC examined, results negative". ABC could be an abbreviation of the person's name.
A good examination would be where there is a lot of context: "Checked the student's bag and found textbooks, pencils, erasers, binders. No hidden compartments found. Interviewed the student and asked "x", "y", "z" questions. Her story corroborated other reports. Student presented herself in a clam manner. Examination results negative". Other times it could be paragraphs and paragraphs, and at the end "examination negative" or "examination positive"
There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?
How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?
machine-learning python nlp natural-language-process
$endgroup$
add a comment |
$begingroup$
I can't discuss my actual dataset, so please bear with me.
Let's say I have a dataset that contains a population of 20,000 examinations by a school principal. The principal is to record their examinations of student misconduct incidents. I want to implement NLP that assess the quality very broadly into two categories: "good examination" or "bad examination" of the full population.
An example of "bad examinations" are:"examination results - negative" or "exam results: negative". Or "check student's bags, checked the person. Nothing suspicious found. Or examination results negative". Or "Examination results positive". Or "ABC examined, results negative". ABC could be an abbreviation of the person's name.
A good examination would be where there is a lot of context: "Checked the student's bag and found textbooks, pencils, erasers, binders. No hidden compartments found. Interviewed the student and asked "x", "y", "z" questions. Her story corroborated other reports. Student presented herself in a clam manner. Examination results negative". Other times it could be paragraphs and paragraphs, and at the end "examination negative" or "examination positive"
There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?
How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?
machine-learning python nlp natural-language-process
$endgroup$
$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13
$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday
add a comment |
$begingroup$
I can't discuss my actual dataset, so please bear with me.
Let's say I have a dataset that contains a population of 20,000 examinations by a school principal. The principal is to record their examinations of student misconduct incidents. I want to implement NLP that assess the quality very broadly into two categories: "good examination" or "bad examination" of the full population.
An example of "bad examinations" are:"examination results - negative" or "exam results: negative". Or "check student's bags, checked the person. Nothing suspicious found. Or examination results negative". Or "Examination results positive". Or "ABC examined, results negative". ABC could be an abbreviation of the person's name.
A good examination would be where there is a lot of context: "Checked the student's bag and found textbooks, pencils, erasers, binders. No hidden compartments found. Interviewed the student and asked "x", "y", "z" questions. Her story corroborated other reports. Student presented herself in a clam manner. Examination results negative". Other times it could be paragraphs and paragraphs, and at the end "examination negative" or "examination positive"
There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?
How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?
machine-learning python nlp natural-language-process
$endgroup$
I can't discuss my actual dataset, so please bear with me.
Let's say I have a dataset that contains a population of 20,000 examinations by a school principal. The principal is to record their examinations of student misconduct incidents. I want to implement NLP that assess the quality very broadly into two categories: "good examination" or "bad examination" of the full population.
An example of "bad examinations" are:"examination results - negative" or "exam results: negative". Or "check student's bags, checked the person. Nothing suspicious found. Or examination results negative". Or "Examination results positive". Or "ABC examined, results negative". ABC could be an abbreviation of the person's name.
A good examination would be where there is a lot of context: "Checked the student's bag and found textbooks, pencils, erasers, binders. No hidden compartments found. Interviewed the student and asked "x", "y", "z" questions. Her story corroborated other reports. Student presented herself in a clam manner. Examination results negative". Other times it could be paragraphs and paragraphs, and at the end "examination negative" or "examination positive"
There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?
How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?
machine-learning python nlp natural-language-process
machine-learning python nlp natural-language-process
edited Apr 2 at 1:14
DataNoob7
asked Mar 30 at 18:25
DataNoob7DataNoob7
243
243
$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13
$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday
add a comment |
$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13
$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday
$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13
$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13
$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday
$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48271%2fimplementation-of-nlp-to-categorize-text-into-two-categories%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48271%2fimplementation-of-nlp-to-categorize-text-into-two-categories%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13
$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday