Implementing back translation as a data augmentation for text classification2019 Community Moderator ElectionAre there libraries or techniques for 'noisifying' text data?Data Augmentation in videosData Augmentation in TensorflowData Augmentation for RegressionIs image data augmentation breaking the distribution?Data augmentation: rotating images and zero valuesGAN's for data augmentationData Augmentation recommended pipelineData Augmentation for Regression ANN with low Sample SizeOn a multi lingual sentiment corpus
Why airport relocation isn't done gradually?
Is domain driven design an anti-SQL pattern?
How is it possible for user's password to be changed after storage was encrypted? (on OS X, Android)
How could a lack of term limits lead to a "dictatorship?"
Creating a loop after a break using Markov Chain in Tikz
Map list to bin numbers
How to make payment on the internet without leaving a money trail?
New order #4: World
How would photo IDs work for shapeshifters?
What is the meaning of "of trouble" in the following sentence?
Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?
Uplifted animals have parts of their "brain" in various locations of their body. Where?
Can I find out the caloric content of bread by dehydrating it?
Why doesn't a const reference extend the life of a temporary object passed via a function?
Find the number of surjections from A to B.
LWC and complex parameters
Where else does the Shulchan Aruch quote an authority by name?
When blogging recipes, how can I support both readers who want the narrative/journey and ones who want the printer-friendly recipe?
Copycat chess is back
Cisco ASA 5585X Internal-Data0/1 interface errors
What is the command to reset a PC without deleting any files
Are objects structures and/or vice versa?
Landlord wants to switch my lease to a "Land contract" to "get back at the city"
Is there any use for defining additional entity types in a SOQL FROM clause?
Implementing back translation as a data augmentation for text classification
2019 Community Moderator ElectionAre there libraries or techniques for 'noisifying' text data?Data Augmentation in videosData Augmentation in TensorflowData Augmentation for RegressionIs image data augmentation breaking the distribution?Data augmentation: rotating images and zero valuesGAN's for data augmentationData Augmentation recommended pipelineData Augmentation for Regression ANN with low Sample SizeOn a multi lingual sentiment corpus
$begingroup$
Since back translation English->other language -> English seems like quite a useful data augmentation technique , I wanted to experiment with it. E.g. it occurred to me that languages from very different language families (but very well supported for economic reasons such as Chinese, Russian, Spanish, Korean, Arabic...) could make for a diverse set of effects occurring in the back translation.
Commercial translation APIs would be a straightforward way of doing this, but without free API key or budget from my organization (would not qualify as academic) that's quickly quite expensive for a private thing.
Pretrained translation models would seem like an obvious alternative (I have a GPU for inference, but clearly that's not enough to train all the models from scratch), but I could e.g. not find those for any OpenNMT variant. Are there any recommendations from others that have used this approach?
deep-learning nlp text data-augmentation machine-translation
$endgroup$
add a comment |
$begingroup$
Since back translation English->other language -> English seems like quite a useful data augmentation technique , I wanted to experiment with it. E.g. it occurred to me that languages from very different language families (but very well supported for economic reasons such as Chinese, Russian, Spanish, Korean, Arabic...) could make for a diverse set of effects occurring in the back translation.
Commercial translation APIs would be a straightforward way of doing this, but without free API key or budget from my organization (would not qualify as academic) that's quickly quite expensive for a private thing.
Pretrained translation models would seem like an obvious alternative (I have a GPU for inference, but clearly that's not enough to train all the models from scratch), but I could e.g. not find those for any OpenNMT variant. Are there any recommendations from others that have used this approach?
deep-learning nlp text data-augmentation machine-translation
$endgroup$
add a comment |
$begingroup$
Since back translation English->other language -> English seems like quite a useful data augmentation technique , I wanted to experiment with it. E.g. it occurred to me that languages from very different language families (but very well supported for economic reasons such as Chinese, Russian, Spanish, Korean, Arabic...) could make for a diverse set of effects occurring in the back translation.
Commercial translation APIs would be a straightforward way of doing this, but without free API key or budget from my organization (would not qualify as academic) that's quickly quite expensive for a private thing.
Pretrained translation models would seem like an obvious alternative (I have a GPU for inference, but clearly that's not enough to train all the models from scratch), but I could e.g. not find those for any OpenNMT variant. Are there any recommendations from others that have used this approach?
deep-learning nlp text data-augmentation machine-translation
$endgroup$
Since back translation English->other language -> English seems like quite a useful data augmentation technique , I wanted to experiment with it. E.g. it occurred to me that languages from very different language families (but very well supported for economic reasons such as Chinese, Russian, Spanish, Korean, Arabic...) could make for a diverse set of effects occurring in the back translation.
Commercial translation APIs would be a straightforward way of doing this, but without free API key or budget from my organization (would not qualify as academic) that's quickly quite expensive for a private thing.
Pretrained translation models would seem like an obvious alternative (I have a GPU for inference, but clearly that's not enough to train all the models from scratch), but I could e.g. not find those for any OpenNMT variant. Are there any recommendations from others that have used this approach?
deep-learning nlp text data-augmentation machine-translation
deep-learning nlp text data-augmentation machine-translation
asked Mar 29 at 7:25
BjörnBjörn
243111
243111
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48192%2fimplementing-back-translation-as-a-data-augmentation-for-text-classification%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48192%2fimplementing-back-translation-as-a-data-augmentation-for-text-classification%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown