Amazon Cloud Image istance most suited for R data mining Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsVM image for data science projectsData Science in C (or C++)Find most representative imageTraining Deep Nets on an Ordinary LaptopData Mining Gear/Goods Websites for Specific PricesK mean clustering method of data miningCloud computing with country-specific region for SwitzerlandWhich cloud platform to maximize my impact as a data scientist?Navigating the jungle of choices for scalable ML deploymentAmazon SageMaker input data?
New Order #6: Easter Egg
Special flights
How can I prevent/balance waiting and turtling as a response to cooldown mechanics
Would color changing eyes affect vision?
Relating to the President and obstruction, were Mueller's conclusions preordained?
What is the "studentd" process?
"klopfte jemand" or "jemand klopfte"?
What are the main differences between Stargate SG-1 cuts?
Google .dev domain strangely redirects to https
Project Euler #1 in C++
What does 丫 mean? 丫是什么意思?
Can two person see the same photon?
What does this say in Elvish?
Why weren't discrete x86 CPUs ever used in game hardware?
Is there public access to the Meteor Crater in Arizona?
Central Vacuuming: Is it worth it, and how does it compare to normal vacuuming?
Tannaka duality for semisimple groups
The test team as an enemy of development? And how can this be avoided?
Asymptotics question
Can you force honesty by using the Speak with Dead and Zone of Truth spells together?
How would you say "es muy psicólogo"?
Connecting Mac Book Pro 2017 to 2 Projectors via USB C
How to write capital alpha?
Sally's older brother
Amazon Cloud Image istance most suited for R data mining
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsVM image for data science projectsData Science in C (or C++)Find most representative imageTraining Deep Nets on an Ordinary LaptopData Mining Gear/Goods Websites for Specific PricesK mean clustering method of data miningCloud computing with country-specific region for SwitzerlandWhich cloud platform to maximize my impact as a data scientist?Navigating the jungle of choices for scalable ML deploymentAmazon SageMaker input data?
$begingroup$
I'm new to the field of machine learning, I always used my laptop for regular statistical analysis with no performance problems. Though lately I started programming with caret and I find myself stuck with hours optimizing models and resampling datasets. I saw EC2 istances but I can't understand the difference between the different classes, I know that generally the one with highest numbers of cpus and RAM are most performant but what is the best type of istance for R programming (for example p one vs c one)? Then I'll chose the right amount of memory for my applications but I'm wondering if one subset is more suited than the others
machine-learning r cloud-computing
$endgroup$
add a comment |
$begingroup$
I'm new to the field of machine learning, I always used my laptop for regular statistical analysis with no performance problems. Though lately I started programming with caret and I find myself stuck with hours optimizing models and resampling datasets. I saw EC2 istances but I can't understand the difference between the different classes, I know that generally the one with highest numbers of cpus and RAM are most performant but what is the best type of istance for R programming (for example p one vs c one)? Then I'll chose the right amount of memory for my applications but I'm wondering if one subset is more suited than the others
machine-learning r cloud-computing
$endgroup$
add a comment |
$begingroup$
I'm new to the field of machine learning, I always used my laptop for regular statistical analysis with no performance problems. Though lately I started programming with caret and I find myself stuck with hours optimizing models and resampling datasets. I saw EC2 istances but I can't understand the difference between the different classes, I know that generally the one with highest numbers of cpus and RAM are most performant but what is the best type of istance for R programming (for example p one vs c one)? Then I'll chose the right amount of memory for my applications but I'm wondering if one subset is more suited than the others
machine-learning r cloud-computing
$endgroup$
I'm new to the field of machine learning, I always used my laptop for regular statistical analysis with no performance problems. Though lately I started programming with caret and I find myself stuck with hours optimizing models and resampling datasets. I saw EC2 istances but I can't understand the difference between the different classes, I know that generally the one with highest numbers of cpus and RAM are most performant but what is the best type of istance for R programming (for example p one vs c one)? Then I'll chose the right amount of memory for my applications but I'm wondering if one subset is more suited than the others
machine-learning r cloud-computing
machine-learning r cloud-computing
asked Apr 3 at 11:12
GGAGGA
1206
1206
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.
Some small rules of thumb:
- R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised
- R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent
- It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP
- ML models are likely not going to be effected by this, more your data prep, check the profiler
- If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.
A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.
For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48512%2famazon-cloud-image-istance-most-suited-for-r-data-mining%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.
Some small rules of thumb:
- R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised
- R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent
- It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP
- ML models are likely not going to be effected by this, more your data prep, check the profiler
- If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.
A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.
For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/
$endgroup$
add a comment |
$begingroup$
It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.
Some small rules of thumb:
- R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised
- R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent
- It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP
- ML models are likely not going to be effected by this, more your data prep, check the profiler
- If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.
A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.
For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/
$endgroup$
add a comment |
$begingroup$
It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.
Some small rules of thumb:
- R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised
- R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent
- It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP
- ML models are likely not going to be effected by this, more your data prep, check the profiler
- If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.
A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.
For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/
$endgroup$
It very much depends on the calculations you are doing as well as the tools you are using to implement them and the size of data you are working with.
Some small rules of thumb:
- R processes (generally) tend towards being RAM bound, so you want may want to get a RAM optimised
- R Studio IDE has a profiling tool you can use to check how your code executes and where the time is spent
- It may be easier and cheaper to optimise code before scaling with tools like data.table and RCPP
- ML models are likely not going to be effected by this, more your data prep, check the profiler
- If you are REALLY keen on getting the model running fast, look into the GPU/TPU instances, however, in the case of R using caret, I don't know if it will utilise these assets. Do your research first as these are the most expensive flavour of EC2. TPU units are (from my limited research) specifically optimised to tensorflow ML.
A plan B/alternative would be to work in aws Sagemaker notebooks to start. Then you can abstract away all the faff of managing the EC2 and just focus on building the ML.
For extra credit on your EC2, use one of the R community AMI's put together by this lovely person: http://www.louisaslett.com/RStudio_AMI/
answered Apr 10 at 8:42
DaveRGPDaveRGP
1313
1313
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48512%2famazon-cloud-image-istance-most-suited-for-r-data-mining%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown