Amazon Cloud Image istance most suited for R data mining Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsVM image for data science projectsData Science in C (or C++)Find most representative imageTraining Deep Nets on an Ordinary LaptopData Mining Gear/Goods Websites for Specific PricesK mean clustering method of data miningCloud computing with country-specific region for SwitzerlandWhich cloud platform to maximize my impact as a data scientist?Navigating the jungle of choices for scalable ML deploymentAmazon SageMaker input data?

New Order #6: Easter Egg

Special flights

How can I prevent/balance waiting and turtling as a response to cooldown mechanics

Would color changing eyes affect vision?

Relating to the President and obstruction, were Mueller's conclusions preordained?

What is the "studentd" process?

"klopfte jemand" or "jemand klopfte"?

What are the main differences between Stargate SG-1 cuts?

Google .dev domain strangely redirects to https

Project Euler #1 in C++

What does 丫 mean? 丫是什么意思?

Can two person see the same photon?

What does this say in Elvish?

Why weren't discrete x86 CPUs ever used in game hardware?

Is there public access to the Meteor Crater in Arizona?

Central Vacuuming: Is it worth it, and how does it compare to normal vacuuming?

Tannaka duality for semisimple groups

The test team as an enemy of development? And how can this be avoided?

Asymptotics question

Can you force honesty by using the Speak with Dead and Zone of Truth spells together?

How would you say "es muy psicólogo"?

Connecting Mac Book Pro 2017 to 2 Projectors via USB C

How to write capital alpha?

Sally's older brother



Amazon Cloud Image istance most suited for R data mining



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsVM image for data science projectsData Science in C (or C++)Find most representative imageTraining Deep Nets on an Ordinary LaptopData Mining Gear/Goods Websites for Specific PricesK mean clustering method of data miningCloud computing with country-specific region for SwitzerlandWhich cloud platform to maximize my impact as a data scientist?Navigating the jungle of choices for scalable ML deploymentAmazon SageMaker input data?










0












$begingroup$


I'm new to the field of machine learning, I always used my laptop for regular statistical analysis with no performance problems. Though lately I started programming with caret and I find myself stuck with hours optimizing models and resampling datasets. I saw EC2 istances but I can't understand the difference between the different classes, I know that generally the one with highest numbers of cpus and RAM are most performant but what is the best type of istance for R programming (for example p one vs c one)? Then I'll chose the right amount of memory for my applications but I'm wondering if one subset is more suited than the others










share|improve this question









$endgroup$
















    0












    $begingroup$


    I'm new to the field of machine learning, I always used my laptop for regular statistical analysis with no performance problems. Though lately I started programming with caret and I find myself stuck with hours optimizing models and resampling datasets. I saw EC2 istances but I can't understand the difference between the different classes, I know that generally the one with highest numbers of cpus and RAM are most performant but what is the best type of istance for R programming (for example p one vs c one)? Then I'll chose the right amount of memory for my applications but I'm wondering if one subset is more suited than the others










    share|improve this question









    $endgroup$














      0












      0








      0





      $begingroup$


      I'm new to the field of machine learning, I always used my laptop for regular statistical analysis with no performance problems. Though lately I started programming with caret and I find myself stuck with hours optimizing models and resampling datasets. I saw EC2 istances but I can't understand the difference between the different classes, I know that generally the one with highest numbers of cpus and RAM are most performant but what is the best type of istance for R programming (for example p one vs c one)? Then I'll chose the right amount of memory for my applications but I'm wondering if one subset is more suited than the others










      share|improve this question









      $endgroup$




      I'm new to the field of machine learning, I always used my laptop for regular statistical analysis with no performance problems. Though lately I started programming with caret and I find myself stuck with hours optimizing models and resampling datasets. I saw EC2 istances but I can't understand the difference between the different classes, I know that generally the one with highest numbers of cpus and RAM are most performant but what is the best type of istance for R programming (for example p one vs c one)? Then I'll chose the right amount of memory for my applications but I'm wondering if one subset is more suited than the others







      machine-learning r cloud-computing






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 3 at 11:12









      GGAGGA

      1206




      1206




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.



          Some small rules of thumb:



          • R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised

          • R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent

          • It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP

            • ML models are likely not going to be effected by this, more your data prep, check the profiler


          • If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.

          A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.



          For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48512%2famazon-cloud-image-istance-most-suited-for-r-data-mining%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.



            Some small rules of thumb:



            • R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised

            • R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent

            • It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP

              • ML models are likely not going to be effected by this, more your data prep, check the profiler


            • If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.

            A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.



            For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/






            share|improve this answer









            $endgroup$

















              1












              $begingroup$

              It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.



              Some small rules of thumb:



              • R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised

              • R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent

              • It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP

                • ML models are likely not going to be effected by this, more your data prep, check the profiler


              • If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.

              A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.



              For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/






              share|improve this answer









              $endgroup$















                1












                1








                1





                $begingroup$

                It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.



                Some small rules of thumb:



                • R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised

                • R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent

                • It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP

                  • ML models are likely not going to be effected by this, more your data prep, check the profiler


                • If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.

                A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.



                For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/






                share|improve this answer









                $endgroup$



                It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.



                Some small rules of thumb:



                • R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised

                • R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent

                • It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP

                  • ML models are likely not going to be effected by this, more your data prep, check the profiler


                • If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.

                A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.



                For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 10 at 8:42









                DaveRGPDaveRGP

                1313




                1313



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48512%2famazon-cloud-image-istance-most-suited-for-r-data-mining%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Marja Vauras Lähteet | Aiheesta muualla | NavigointivalikkoMarja Vauras Turun yliopiston tutkimusportaalissaInfobox OKSuomalaisen Tiedeakatemian varsinaiset jäsenetKasvatustieteiden tiedekunnan dekaanit ja muu johtoMarja VaurasKoulutusvienti on kestävyys- ja ketteryyslaji (2.5.2017)laajentamallaWorldCat Identities0000 0001 0855 9405n86069603utb201588738523620927

                    Which is better: GPT or RelGAN for text generation?2019 Community Moderator ElectionWhat is the difference between TextGAN and LM for text generation?GANs (generative adversarial networks) possible for text as well?Generator loss not decreasing- text to image synthesisChoosing a right algorithm for template-based text generationHow should I format input and output for text generation with LSTMsGumbel Softmax vs Vanilla Softmax for GAN trainingWhich neural network to choose for classification from text/speech?NLP text autoencoder that generates text in poetic meterWhat is the interpretation of the expectation notation in the GAN formulation?What is the difference between TextGAN and LM for text generation?How to prepare the data for text generation task

                    Is flight data recorder erased after every flight?When are black boxes used?What protects the location beacon (pinger) of a flight data recorder?Is there anywhere I can pick up raw flight data recorder information?Who legally owns the Flight Data Recorder?Constructing flight recorder dataWhy are FDRs and CVRs still two separate physical devices?What are the data elements shown on the GE235 flight data recorder (FDR) plot?Are CVR and FDR reset after every flight?What is the format of data stored by a Flight Data Recorder?How much data is stored in the flight data recorder per hour in a typical flight of an A380?Is a smart flight data recorder possible?