How to deal with memory insufficient read by pandas in pythonpandas dataframes memoryPandas: access fields within field in a DataFramePandas - read CSV with spanish charactersBest approach for this unsupervised clustering problem with categorical data?Pandas: how to read certain file type in pandasR: I need to understand the mechanism behind reading a subset of a file in R (e.g., via sqldf) or other data-science-centered programming languageOnline vs minibatch training for speedDifferent approaches of creating the test setMerging dataframes in Pandas is taking a surprisingly long timeEfficiently training big models on big dataframes with big samples, with crossvalidation and shuffling, and limited ram

Put the phone down / Put down the phone

Why the "ls" command is showing the permissions of files in a FAT32 partition?

Can a multiclassed 2019 UA artificer/Pact of the Blade warlock use Thirsting Blade and Arcane Armament to make 3 attacks per Attack action?

What is this high flying aircraft over Pennsylvania?

Travelling in US for more than 90 days

Adding up numbers in Portuguese is strange

Did I make a mistake by ccing email to boss to others?

A society in which the "family system" is similar to those of lions or monkeys

How to predict the next number in a series while having additional series of data that might affect it?

How would a solely written language work mechanically

Animation: customize bounce interpolation

Weird lines in Microsoft Word

What is the tangent at a sharp point on a curve?

How to add numbers in array using forEach

Can anyone precisely describe what it means (or feels like) to play exactly what your "inner ear" is hearing?

Asserting that Atheism and Theism are both faith based positions

If Captain Marvel (MCU) marries a human male, will they have human or Kree children?

Bash: Why does this Brace Expression work this way?

Isometric embedding of a genus g surface

Should I warn a new PhD Student?

Writing in a Christian voice

Can I say "fingers" when referring to toes?

How to preserve electronics (computers, iPads and phones) for hundreds of years

Make a Bowl of Alphabet Soup



How to deal with memory insufficient read by pandas in python


pandas dataframes memoryPandas: access fields within field in a DataFramePandas - read CSV with spanish charactersBest approach for this unsupervised clustering problem with categorical data?Pandas: how to read certain file type in pandasR: I need to understand the mechanism behind reading a subset of a file in R (e.g., via sqldf) or other data-science-centered programming languageOnline vs minibatch training for speedDifferent approaches of creating the test setMerging dataframes in Pandas is taking a surprisingly long timeEfficiently training big models on big dataframes with big samples, with crossvalidation and shuffling, and limited ram













1












$begingroup$


I use pandas.read_csv to read a huge file for machine learning, but I got memory error.



Someone recommend me to set arg chunksize but I need sort, random access...etc. So I need to load whole data into memory or use another way.



Some ways I think it's possible is Hadoop. Another one is incremental training, but it's like reading chunksize in read_csv



Or other software/library/ways can I use?










share|improve this question









$endgroup$







  • 1




    $begingroup$
    Check this out
    $endgroup$
    – Kiritee Gak
    yesterday















1












$begingroup$


I use pandas.read_csv to read a huge file for machine learning, but I got memory error.



Someone recommend me to set arg chunksize but I need sort, random access...etc. So I need to load whole data into memory or use another way.



Some ways I think it's possible is Hadoop. Another one is incremental training, but it's like reading chunksize in read_csv



Or other software/library/ways can I use?










share|improve this question









$endgroup$







  • 1




    $begingroup$
    Check this out
    $endgroup$
    – Kiritee Gak
    yesterday













1












1








1





$begingroup$


I use pandas.read_csv to read a huge file for machine learning, but I got memory error.



Someone recommend me to set arg chunksize but I need sort, random access...etc. So I need to load whole data into memory or use another way.



Some ways I think it's possible is Hadoop. Another one is incremental training, but it's like reading chunksize in read_csv



Or other software/library/ways can I use?










share|improve this question









$endgroup$




I use pandas.read_csv to read a huge file for machine learning, but I got memory error.



Someone recommend me to set arg chunksize but I need sort, random access...etc. So I need to load whole data into memory or use another way.



Some ways I think it's possible is Hadoop. Another one is incremental training, but it's like reading chunksize in read_csv



Or other software/library/ways can I use?







machine-learning pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked yesterday









code_workercode_worker

113




113







  • 1




    $begingroup$
    Check this out
    $endgroup$
    – Kiritee Gak
    yesterday












  • 1




    $begingroup$
    Check this out
    $endgroup$
    – Kiritee Gak
    yesterday







1




1




$begingroup$
Check this out
$endgroup$
– Kiritee Gak
yesterday




$begingroup$
Check this out
$endgroup$
– Kiritee Gak
yesterday










1 Answer
1






active

oldest

votes


















1












$begingroup$

I would suggest you to use Dask. I used it successfully when I had to read large data with my 4GB RAM. You can get more details here.



To read a CSV, you can do the following:



import dask.dataframe as dd

csv_file = 'data.csv'
df = dd.read_csv(csv_file)





share|improve this answer











$endgroup$












    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47594%2fhow-to-deal-with-memory-insufficient-read-by-pandas-in-python%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    I would suggest you to use Dask. I used it successfully when I had to read large data with my 4GB RAM. You can get more details here.



    To read a CSV, you can do the following:



    import dask.dataframe as dd

    csv_file = 'data.csv'
    df = dd.read_csv(csv_file)





    share|improve this answer











    $endgroup$

















      1












      $begingroup$

      I would suggest you to use Dask. I used it successfully when I had to read large data with my 4GB RAM. You can get more details here.



      To read a CSV, you can do the following:



      import dask.dataframe as dd

      csv_file = 'data.csv'
      df = dd.read_csv(csv_file)





      share|improve this answer











      $endgroup$















        1












        1








        1





        $begingroup$

        I would suggest you to use Dask. I used it successfully when I had to read large data with my 4GB RAM. You can get more details here.



        To read a CSV, you can do the following:



        import dask.dataframe as dd

        csv_file = 'data.csv'
        df = dd.read_csv(csv_file)





        share|improve this answer











        $endgroup$



        I would suggest you to use Dask. I used it successfully when I had to read large data with my 4GB RAM. You can get more details here.



        To read a CSV, you can do the following:



        import dask.dataframe as dd

        csv_file = 'data.csv'
        df = dd.read_csv(csv_file)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited yesterday









        Glorfindel

        128119




        128119










        answered yesterday









        InAFlashInAFlash

        3521315




        3521315



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47594%2fhow-to-deal-with-memory-insufficient-read-by-pandas-in-python%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Is flight data recorder erased after every flight?When are black boxes used?What protects the location beacon (pinger) of a flight data recorder?Is there anywhere I can pick up raw flight data recorder information?Who legally owns the Flight Data Recorder?Constructing flight recorder dataWhy are FDRs and CVRs still two separate physical devices?What are the data elements shown on the GE235 flight data recorder (FDR) plot?Are CVR and FDR reset after every flight?What is the format of data stored by a Flight Data Recorder?How much data is stored in the flight data recorder per hour in a typical flight of an A380?Is a smart flight data recorder possible?

            Is there a general name for the setup in which payoffs are not known exactly but players try to influence each other's perception of the payoffs?Osborne, Nash equilibria and the correctness of beliefsIs there a name for this family of games (Binomial games?)?Perfect Bayesian EquilibriumCalculating mixed strategy equilibrium in battle of sexesPure Strategy SPNEIs there a commitment mechanism which allows players to achieve pareto optimal solutions?Extensive Form GamesAn $n$-player prisoner's dilemma where a coalition of 2 players is better off defectingTit-For-Stat Strategy Best RepliesPotential solutions of the $n$-player Prisoner's Dilemma

            Which is better: GPT or RelGAN for text generation?2019 Community Moderator ElectionWhat is the difference between TextGAN and LM for text generation?GANs (generative adversarial networks) possible for text as well?Generator loss not decreasing- text to image synthesisChoosing a right algorithm for template-based text generationHow should I format input and output for text generation with LSTMsGumbel Softmax vs Vanilla Softmax for GAN trainingWhich neural network to choose for classification from text/speech?NLP text autoencoder that generates text in poetic meterWhat is the interpretation of the expectation notation in the GAN formulation?What is the difference between TextGAN and LM for text generation?How to prepare the data for text generation task