Advantages of pandas dataframe to regular relational database Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsIs this Neo4j comparison to RDBMS execution time correct?When a relational database has better performance than a no relationalPandas Dataframe to DMatrixSeeking advice on database architecture — given my problem, what tools should I learn?Improve Pandas dataframe filtering speedConvert a list of lists into a Pandas DataframeDatabase System for Manual EntryResampling pandas Dataframe keeping other columnsReplacing column values in pandas with specific column with multiple database operation?Pandas DataFrame Rollup Error

Can I add database to AWS RDS MySQL without creating new instance?

How can I make names more distinctive without making them longer?

Mortgage adviser recommends a longer term than necessary combined with overpayments

Need a suitable toxic chemical for a murder plot in my novel

Determine whether f is a function, an injection, a surjection

Is there a documented rationale why the House Ways and Means chairman can demand tax info?

No baking right

How many things? AとBがふたつ

Working around an AWS network ACL rule limit

Is there a service that would inform me whenever a new direct route is scheduled from a given airport?

What do you call the holes in a flute?

How should I respond to a player wanting to catch a sword between their hands?

Active filter with series inductor and resistor - do these exist?

Stopping real property loss from eroding embankment

What is the largest species of polychaete?

3 doors, three guards, one stone

Slither Like a Snake

What computer would be fastest for Mathematica Home Edition?

How to rotate it perfectly?

Cauchy Sequence Characterized only By Directly Neighbouring Sequence Members

Direct Experience of Meditation

Stars Make Stars

Is drag coefficient lowest at zero angle of attack?

Stop battery usage [Ubuntu 18]

Advantages of pandas dataframe to regular relational database

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsIs this Neo4j comparison to RDBMS execution time correct?When a relational database has better performance than a no relationalPandas Dataframe to DMatrixSeeking advice on database architecture — given my problem, what tools should I learn?Improve Pandas dataframe filtering speedConvert a list of lists into a Pandas DataframeDatabase System for Manual EntryResampling pandas Dataframe keeping other columnsReplacing column values in pandas with specific column with multiple database operation?Pandas DataFrame Rollup Error

In Data Science, many seem to be using pandas dataframes as the datastore. What are the features of pandas that make it a superior datastore compared to regular relational databases like MySQL, which are used to store data in many other fields of programming?

While pandas does provide some useful functions for data exploration, you can't use SQL and you lose features like query optimization or access restriction.

edited Jul 3 '17 at 6:05

Stephen Rauch♦

1,52551330

asked Jul 2 '17 at 20:02

Simon Böhm

218210

3

$begingroup$
pandas is not a datastore. Turn off your computer and your dataframe will not be there. pandas is for munging in memory. Which means if it does not fit in memory it will not work. But it has a big brother called Spark so that is not a big deal. The big brother does in fact support SQL and query optimization. See also pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html
$endgroup$
– Emre
Jul 2 '17 at 20:29

add a comment |

While pandas does provide some useful functions for data exploration, you can't use SQL and you lose features like query optimization or access restriction.

edited Jul 3 '17 at 6:05

Stephen Rauch♦

1,52551330

asked Jul 2 '17 at 20:02

Simon Böhm

218210

3

$begingroup$
pandas is not a datastore. Turn off your computer and your dataframe will not be there. pandas is for munging in memory. Which means if it does not fit in memory it will not work. But it has a big brother called Spark so that is not a big deal. The big brother does in fact support SQL and query optimization. See also pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html
$endgroup$
– Emre
Jul 2 '17 at 20:29

add a comment |

While pandas does provide some useful functions for data exploration, you can't use SQL and you lose features like query optimization or access restriction.

edited Jul 3 '17 at 6:05

Stephen Rauch♦

1,52551330

asked Jul 2 '17 at 20:02

Simon Böhm

218210

While pandas does provide some useful functions for data exploration, you can't use SQL and you lose features like query optimization or access restriction.

pandas databases

edited Jul 3 '17 at 6:05

Stephen Rauch♦

1,52551330

asked Jul 2 '17 at 20:02

Simon Böhm

218210

edited Jul 3 '17 at 6:05

Stephen Rauch♦

1,52551330

asked Jul 2 '17 at 20:02

Simon Böhm

218210

edited Jul 3 '17 at 6:05

Stephen Rauch♦

1,52551330

edited Jul 3 '17 at 6:05

Stephen Rauch♦

1,52551330

edited Jul 3 '17 at 6:05

Stephen Rauch♦

1,52551330

asked Jul 2 '17 at 20:02

Simon Böhm

218210

asked Jul 2 '17 at 20:02

Simon Böhm

218210

asked Jul 2 '17 at 20:02

Simon Böhm

218210

3

$begingroup$
pandas is not a datastore. Turn off your computer and your dataframe will not be there. pandas is for munging in memory. Which means if it does not fit in memory it will not work. But it has a big brother called Spark so that is not a big deal. The big brother does in fact support SQL and query optimization. See also pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html
$endgroup$
– Emre
Jul 2 '17 at 20:29

add a comment |

3

$begingroup$
pandas is not a datastore. Turn off your computer and your dataframe will not be there. pandas is for munging in memory. Which means if it does not fit in memory it will not work. But it has a big brother called Spark so that is not a big deal. The big brother does in fact support SQL and query optimization. See also pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html
$endgroup$
– Emre
Jul 2 '17 at 20:29

pandas is not a datastore. Turn off your computer and your dataframe will not be there. pandas is for munging in memory. Which means if it does not fit in memory it will not work. But it has a big brother called Spark so that is not a big deal. The big brother does in fact support SQL and query optimization. See also pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html

– Emre
Jul 2 '17 at 20:29

add a comment |

4 Answers
4

active

oldest

votes

I think the premise of your question has a problem. Pandas is not a "datastore" in the way an RDBMS is. Pandas is a Python library for manipulating data that will fit in memory. Disadvantages:

Pandas does not persist data. It even has a (slow) function called TO_SQL that will persist your pandas data frame to an RDBMS table.

Pandas will only handle results that fit in memory, which is easy to fill. You can either use dask to work around that, or you can work on the data in the RDBMS (which uses all sorts of tricks like temp space) to operate on data that exceeds RAM.

answered Jul 3 '17 at 17:35

CalZ

1,438213

add a comment |

From the pandas (Main Page)

Python Data Analysis Library¶

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

While pandas can certainly access data via SQL, or from several other data storage methods, its primary purpose is to make it easier when using Python to do data analysis.

To that end pandas has various methods available that allow some relational algebra operations that can be compared to SQL.

Also Pandas provides easy access to NumPy, which

is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object

sophisticated (broadcasting) functions

tools for integrating C/C++ and Fortran code

useful linear algebra, Fourier transform, and random number capabilities

answered Jul 2 '17 at 22:29

Stephen Rauch♦

1,52551330

add a comment |

Pandas is an in-memory data storage tool. This allows you to do very rapid calculations over large amounts of data very quickly.

SQL persistently stores data and is a database.

answered Jul 3 '17 at 20:01

Henry

1842

add a comment |

In addition to the accepted answer:

Relational databases have a large number of bytes of per-row overhead (example: this question), which is used for bookkeeping, telling nulls from not nulls, ensuring standards such as ACID. Every time you read/write a column, not only the few bytes representing the value of this column will be read, but also these bookkeeping bytes will be accessed and possibly updated.

In contrast, pandas (also R data.table) is more like an in-memory column store. One column is just an array of values and you are able to use fast numpy vectorized operations / list apprehensions that only access values that you really need. Just that for tables with few primitive columns makes relational databases multiple times slower for many data science use cases.

edited Apr 2 at 8:14

answered Apr 2 at 8:09

Valentas

382314

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f20118%2fadvantages-of-pandas-dataframe-to-regular-relational-database%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

I think the premise of your question has a problem. Pandas is not a "datastore" in the way an RDBMS is. Pandas is a Python library for manipulating data that will fit in memory. Disadvantages:

Pandas does not persist data. It even has a (slow) function called TO_SQL that will persist your pandas data frame to an RDBMS table.

Pandas will only handle results that fit in memory, which is easy to fill. You can either use dask to work around that, or you can work on the data in the RDBMS (which uses all sorts of tricks like temp space) to operate on data that exceeds RAM.

answered Jul 3 '17 at 17:35

CalZ

1,438213

add a comment |

I think the premise of your question has a problem. Pandas is not a "datastore" in the way an RDBMS is. Pandas is a Python library for manipulating data that will fit in memory. Disadvantages:

Pandas does not persist data. It even has a (slow) function called TO_SQL that will persist your pandas data frame to an RDBMS table.

Pandas will only handle results that fit in memory, which is easy to fill. You can either use dask to work around that, or you can work on the data in the RDBMS (which uses all sorts of tricks like temp space) to operate on data that exceeds RAM.

answered Jul 3 '17 at 17:35

CalZ

1,438213

add a comment |

I think the premise of your question has a problem. Pandas is not a "datastore" in the way an RDBMS is. Pandas is a Python library for manipulating data that will fit in memory. Disadvantages:

Pandas does not persist data. It even has a (slow) function called TO_SQL that will persist your pandas data frame to an RDBMS table.

Pandas will only handle results that fit in memory, which is easy to fill. You can either use dask to work around that, or you can work on the data in the RDBMS (which uses all sorts of tricks like temp space) to operate on data that exceeds RAM.

answered Jul 3 '17 at 17:35

CalZ

1,438213

I think the premise of your question has a problem. Pandas is not a "datastore" in the way an RDBMS is. Pandas is a Python library for manipulating data that will fit in memory. Disadvantages:

Pandas does not persist data. It even has a (slow) function called TO_SQL that will persist your pandas data frame to an RDBMS table.

Pandas will only handle results that fit in memory, which is easy to fill. You can either use dask to work around that, or you can work on the data in the RDBMS (which uses all sorts of tricks like temp space) to operate on data that exceeds RAM.

answered Jul 3 '17 at 17:35

CalZ

1,438213

answered Jul 3 '17 at 17:35

CalZ

1,438213

answered Jul 3 '17 at 17:35

CalZ

1,438213

answered Jul 3 '17 at 17:35

CalZ

1,438213

add a comment |

From the pandas (Main Page)

Python Data Analysis Library¶

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

While pandas can certainly access data via SQL, or from several other data storage methods, its primary purpose is to make it easier when using Python to do data analysis.

To that end pandas has various methods available that allow some relational algebra operations that can be compared to SQL.

Also Pandas provides easy access to NumPy, which

is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object

sophisticated (broadcasting) functions

tools for integrating C/C++ and Fortran code

useful linear algebra, Fourier transform, and random number capabilities

answered Jul 2 '17 at 22:29

Stephen Rauch♦

1,52551330

add a comment |

From the pandas (Main Page)

Python Data Analysis Library¶

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

While pandas can certainly access data via SQL, or from several other data storage methods, its primary purpose is to make it easier when using Python to do data analysis.

To that end pandas has various methods available that allow some relational algebra operations that can be compared to SQL.

Also Pandas provides easy access to NumPy, which

is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object

sophisticated (broadcasting) functions

tools for integrating C/C++ and Fortran code

useful linear algebra, Fourier transform, and random number capabilities

answered Jul 2 '17 at 22:29

Stephen Rauch♦

1,52551330

add a comment |

From the pandas (Main Page)

Python Data Analysis Library¶

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

While pandas can certainly access data via SQL, or from several other data storage methods, its primary purpose is to make it easier when using Python to do data analysis.

To that end pandas has various methods available that allow some relational algebra operations that can be compared to SQL.

Also Pandas provides easy access to NumPy, which

is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object

sophisticated (broadcasting) functions

tools for integrating C/C++ and Fortran code

useful linear algebra, Fourier transform, and random number capabilities

answered Jul 2 '17 at 22:29

Stephen Rauch♦

1,52551330

From the pandas (Main Page)

Python Data Analysis Library¶

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

While pandas can certainly access data via SQL, or from several other data storage methods, its primary purpose is to make it easier when using Python to do data analysis.

To that end pandas has various methods available that allow some relational algebra operations that can be compared to SQL.

Also Pandas provides easy access to NumPy, which

is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object

sophisticated (broadcasting) functions

tools for integrating C/C++ and Fortran code

useful linear algebra, Fourier transform, and random number capabilities

answered Jul 2 '17 at 22:29

Stephen Rauch♦

1,52551330

answered Jul 2 '17 at 22:29

Stephen Rauch♦

1,52551330

answered Jul 2 '17 at 22:29

Stephen Rauch♦

1,52551330

answered Jul 2 '17 at 22:29

Stephen Rauch♦

1,52551330

add a comment |

Pandas is an in-memory data storage tool. This allows you to do very rapid calculations over large amounts of data very quickly.

SQL persistently stores data and is a database.

answered Jul 3 '17 at 20:01

Henry

1842

add a comment |

Pandas is an in-memory data storage tool. This allows you to do very rapid calculations over large amounts of data very quickly.

SQL persistently stores data and is a database.

answered Jul 3 '17 at 20:01

Henry

1842

add a comment |

Pandas is an in-memory data storage tool. This allows you to do very rapid calculations over large amounts of data very quickly.

SQL persistently stores data and is a database.

answered Jul 3 '17 at 20:01

Henry

1842

Pandas is an in-memory data storage tool. This allows you to do very rapid calculations over large amounts of data very quickly.

SQL persistently stores data and is a database.

answered Jul 3 '17 at 20:01

Henry

1842

answered Jul 3 '17 at 20:01

Henry

1842

answered Jul 3 '17 at 20:01

Henry

1842

answered Jul 3 '17 at 20:01

Henry

1842

add a comment |

In addition to the accepted answer:

edited Apr 2 at 8:14

answered Apr 2 at 8:09

Valentas

382314

add a comment |

In addition to the accepted answer:

edited Apr 2 at 8:14

answered Apr 2 at 8:09

Valentas

382314

add a comment |

In addition to the accepted answer:

edited Apr 2 at 8:14

answered Apr 2 at 8:09

Valentas

382314

In addition to the accepted answer:

edited Apr 2 at 8:14

answered Apr 2 at 8:09

Valentas

382314

edited Apr 2 at 8:14

answered Apr 2 at 8:09

Valentas

382314

answered Apr 2 at 8:09

Valentas

382314

answered Apr 2 at 8:09

Valentas

382314

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

0ftKtCw3aEm7ZWqi10pMl7u6fCaYoNLRn3LlMfBz74,TaPJIy3m7bqPWJ,nPR0Du I5CG2sI6ozmElR

搜尋此網誌

Trjtdtk

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

4 Answers
4

4 Answers
4

4 Answers
4