Implementation of NLP to categorize text into two categories The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 11:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhich classification algorithms to try for classifying text data into 300 categoriesTrain a classifier for a game with feedback on chosen move instead of true labelsAlgorithm for classification of words into given categoriesCategorize observations with inconsistent text descriptionsNLP grouping word categoriesCommon deep learning practices in NLP for text classificationNLP - Researches about data oriented text generationWord classification (not text classification) using NLPprepare email text for nlp (sentiment analysis)Training NLP with multiple text input features

Why doesn't a hydraulic lever violate conservation of energy?

how can a perfect fourth interval be considered either consonant or dissonant?

Why can't devices on different VLANs, but on the same subnet, communicate?

Do warforged have souls?

Button changing its text & action. Good or terrible?

Did the UK government pay "millions and millions of dollars" to try to snag Julian Assange?

Ubuntu Err :18 http://dl.google.com/linux/chrome/deb stable Release.gpg KEYEXPIRED 1555048520

Does Parliament need to approve the new Brexit delay to 31 October 2019?

Identify 80s or 90s comics with ripped creatures (not dwarves)

Drawing vertical/oblique lines in Metrical tree (tikz-qtree, tipa)

What was the last x86 CPU that did not have the x87 floating-point unit built in?

Humiliated in front of employees

Can the Right Ascension and Argument of Perigee of a spacecraft's orbit keep varying by themselves with time?

What do I do when my TA workload is more than expected?

Word for: a synonym with a positive connotation?

Match Roman Numerals

Circular reasoning in L'Hopital's rule

Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?

Why can't wing-mounted spoilers be used to steepen approaches?

Homework question about an engine pulling a train

Variable with quotation marks "$()"

How do spell lists change if the party levels up without taking a long rest?

Word to describe a time interval

Simulating Exploding Dice

Implementation of NLP to categorize text into two categories

The 2019 Stack Overflow Developer Survey Results Are In

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 11:00UTC (8:00pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsWhich classification algorithms to try for classifying text data into 300 categoriesTrain a classifier for a game with feedback on chosen move instead of true labelsAlgorithm for classification of words into given categoriesCategorize observations with inconsistent text descriptionsNLP grouping word categoriesCommon deep learning practices in NLP for text classificationNLP - Researches about data oriented text generationWord classification (not text classification) using NLPprepare email text for nlp (sentiment analysis)Training NLP with multiple text input features

I can't discuss my actual dataset, so please bear with me.

Let's say I have a dataset that contains a population of 20,000 examinations by a school principal. The principal is to record their examinations of student misconduct incidents. I want to implement NLP that assess the quality very broadly into two categories: "good examination" or "bad examination" of the full population.

An example of "bad examinations" are:"examination results - negative" or "exam results: negative". Or "check student's bags, checked the person. Nothing suspicious found. Or examination results negative". Or "Examination results positive". Or "ABC examined, results negative". ABC could be an abbreviation of the person's name.

A good examination would be where there is a lot of context: "Checked the student's bag and found textbooks, pencils, erasers, binders. No hidden compartments found. Interviewed the student and asked "x", "y", "z" questions. Her story corroborated other reports. Student presented herself in a clam manner. Examination results negative". Other times it could be paragraphs and paragraphs, and at the end "examination negative" or "examination positive"

There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?

How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?

edited Apr 2 at 1:14

asked Mar 30 at 18:25

DataNoob7

243

$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13

$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday

add a comment |

I can't discuss my actual dataset, so please bear with me.

There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?

How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?

edited Apr 2 at 1:14

asked Mar 30 at 18:25

DataNoob7

243

$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13

$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday

add a comment |

I can't discuss my actual dataset, so please bear with me.

There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?

How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?

edited Apr 2 at 1:14

asked Mar 30 at 18:25

DataNoob7

243

I can't discuss my actual dataset, so please bear with me.

There are also instances where all what could be listed is "wrong person because of different birth date. Examination results negative" and this is perfectly fine. Would this be a third category?

How would I go about implementing a reliable NLP solution? My first instinct is to take a random sample, classify it manually, and then apply it to the rest of the 20,000 records?

machine-learning python nlp natural-language-process

edited Apr 2 at 1:14

asked Mar 30 at 18:25

DataNoob7

243

edited Apr 2 at 1:14

asked Mar 30 at 18:25

DataNoob7

243

edited Apr 2 at 1:14

asked Mar 30 at 18:25

DataNoob7

243

asked Mar 30 at 18:25

DataNoob7

243

asked Mar 30 at 18:25

DataNoob7

243

$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13

$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday

add a comment |

$begingroup$
Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.
$endgroup$
– DataNoob7
Apr 2 at 1:13

$begingroup$
Is anyone able to provide insight?
$endgroup$
– DataNoob7
yesterday

Is anyone able to point me into the right direction? How many records will I have to manually classify in that there is 20,000 records? From my research, I understand NLP becomes challenging with short sentences, which I suspect might be a lot.

– DataNoob7
Apr 2 at 1:13

Is anyone able to provide insight?

– DataNoob7
yesterday

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48271%2fimplementation-of-nlp-to-categorize-text-into-two-categories%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High