Implementing back translation as a data augmentation for text classification2019 Community Moderator ElectionAre there libraries or techniques for 'noisifying' text data?Data Augmentation in videosData Augmentation in TensorflowData Augmentation for RegressionIs image data augmentation breaking the distribution?Data augmentation: rotating images and zero valuesGAN's for data augmentationData Augmentation recommended pipelineData Augmentation for Regression ANN with low Sample SizeOn a multi lingual sentiment corpus

Why airport relocation isn't done gradually?

Is domain driven design an anti-SQL pattern?

How is it possible for user's password to be changed after storage was encrypted? (on OS X, Android)

How could a lack of term limits lead to a "dictatorship?"

Creating a loop after a break using Markov Chain in Tikz

Map list to bin numbers

How to make payment on the internet without leaving a money trail?

New order #4: World

How would photo IDs work for shapeshifters?

What is the meaning of "of trouble" in the following sentence?

Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?

Uplifted animals have parts of their "brain" in various locations of their body. Where?

Can I find out the caloric content of bread by dehydrating it?

Why doesn't a const reference extend the life of a temporary object passed via a function?

Find the number of surjections from A to B.

LWC and complex parameters

Where else does the Shulchan Aruch quote an authority by name?

When blogging recipes, how can I support both readers who want the narrative/journey and ones who want the printer-friendly recipe?

Copycat chess is back

Cisco ASA 5585X Internal-Data0/1 interface errors

What is the command to reset a PC without deleting any files

Are objects structures and/or vice versa?

Landlord wants to switch my lease to a "Land contract" to "get back at the city"

Is there any use for defining additional entity types in a SOQL FROM clause?



Implementing back translation as a data augmentation for text classification



2019 Community Moderator ElectionAre there libraries or techniques for 'noisifying' text data?Data Augmentation in videosData Augmentation in TensorflowData Augmentation for RegressionIs image data augmentation breaking the distribution?Data augmentation: rotating images and zero valuesGAN's for data augmentationData Augmentation recommended pipelineData Augmentation for Regression ANN with low Sample SizeOn a multi lingual sentiment corpus










0












$begingroup$


Since back translation English->other language -> English seems like quite a useful data augmentation technique , I wanted to experiment with it. E.g. it occurred to me that languages from very different language families (but very well supported for economic reasons such as Chinese, Russian, Spanish, Korean, Arabic...) could make for a diverse set of effects occurring in the back translation.



Commercial translation APIs would be a straightforward way of doing this, but without free API key or budget from my organization (would not qualify as academic) that's quickly quite expensive for a private thing.



Pretrained translation models would seem like an obvious alternative (I have a GPU for inference, but clearly that's not enough to train all the models from scratch), but I could e.g. not find those for any OpenNMT variant. Are there any recommendations from others that have used this approach?










share|improve this question









$endgroup$
















    0












    $begingroup$


    Since back translation English->other language -> English seems like quite a useful data augmentation technique , I wanted to experiment with it. E.g. it occurred to me that languages from very different language families (but very well supported for economic reasons such as Chinese, Russian, Spanish, Korean, Arabic...) could make for a diverse set of effects occurring in the back translation.



    Commercial translation APIs would be a straightforward way of doing this, but without free API key or budget from my organization (would not qualify as academic) that's quickly quite expensive for a private thing.



    Pretrained translation models would seem like an obvious alternative (I have a GPU for inference, but clearly that's not enough to train all the models from scratch), but I could e.g. not find those for any OpenNMT variant. Are there any recommendations from others that have used this approach?










    share|improve this question









    $endgroup$














      0












      0








      0





      $begingroup$


      Since back translation English->other language -> English seems like quite a useful data augmentation technique , I wanted to experiment with it. E.g. it occurred to me that languages from very different language families (but very well supported for economic reasons such as Chinese, Russian, Spanish, Korean, Arabic...) could make for a diverse set of effects occurring in the back translation.



      Commercial translation APIs would be a straightforward way of doing this, but without free API key or budget from my organization (would not qualify as academic) that's quickly quite expensive for a private thing.



      Pretrained translation models would seem like an obvious alternative (I have a GPU for inference, but clearly that's not enough to train all the models from scratch), but I could e.g. not find those for any OpenNMT variant. Are there any recommendations from others that have used this approach?










      share|improve this question









      $endgroup$




      Since back translation English->other language -> English seems like quite a useful data augmentation technique , I wanted to experiment with it. E.g. it occurred to me that languages from very different language families (but very well supported for economic reasons such as Chinese, Russian, Spanish, Korean, Arabic...) could make for a diverse set of effects occurring in the back translation.



      Commercial translation APIs would be a straightforward way of doing this, but without free API key or budget from my organization (would not qualify as academic) that's quickly quite expensive for a private thing.



      Pretrained translation models would seem like an obvious alternative (I have a GPU for inference, but clearly that's not enough to train all the models from scratch), but I could e.g. not find those for any OpenNMT variant. Are there any recommendations from others that have used this approach?







      deep-learning nlp text data-augmentation machine-translation






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 29 at 7:25









      BjörnBjörn

      243111




      243111




















          0






          active

          oldest

          votes












          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48192%2fimplementing-back-translation-as-a-data-augmentation-for-text-classification%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48192%2fimplementing-back-translation-as-a-data-augmentation-for-text-classification%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High