Adding a column to a dataframe in pandas using another Column2019 Community Moderator ElectionPandas Dataframe to DMatrixPandas column count mismatch after insert into dataframeRemove Local Outliers from Dataframe using pandasResampling pandas Dataframe keeping other columnsMapping column values of one DataFrame to another DataFrame using a key with different header nameshow to make new columns using another column in pandas?Split datetime64 column in dataframe on yearPandas DataFrame Rollup ErrorMultiple filtering pandas columns based on values in another columnHow to add date column in python pandas dataframe
Is it legal for company to use my work email to pretend I still work there?
Is it unprofessional to ask if a job posting on GlassDoor is real?
How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?
Watching something be written to a file live with tail
Are astronomers waiting to see something in an image from a gravitational lens that they've already seen in an adjacent image?
Do infinite dimensional systems make sense?
Java Casting: Java 11 throws LambdaConversionException while 1.8 does not
Why is Minecraft giving an OpenGL error?
Decision tree nodes overlapping with Tikz
Arrow those variables!
Operational amplifier as comparator at high frequency
Could an aircraft fly or hover using only jets of compressed air?
Do I have a twin with permutated remainders?
Can a Cauchy sequence converge for one metric while not converging for another?
How to regain access to running applications after accidentally zapping X.org?
dbcc cleantable batch size explanation
A case of the sniffles
Why doesn't a class having private constructor prevent inheriting from this class? How to control which classes can inherit from a certain base?
Client team has low performances and low technical skills: we always fix their work and now they stop collaborate with us. How to solve?
Which country benefited the most from UN Security Council vetoes?
Theorems that impeded progress
What is a clear way to write a bar that has an extra beat?
When a company launches a new product do they "come out" with a new product or do they "come up" with a new product?
Can a monk's single staff be considered dual wielded, as per the Dual Wielder feat?
Adding a column to a dataframe in pandas using another Column
2019 Community Moderator ElectionPandas Dataframe to DMatrixPandas column count mismatch after insert into dataframeRemove Local Outliers from Dataframe using pandasResampling pandas Dataframe keeping other columnsMapping column values of one DataFrame to another DataFrame using a key with different header nameshow to make new columns using another column in pandas?Split datetime64 column in dataframe on yearPandas DataFrame Rollup ErrorMultiple filtering pandas columns based on values in another columnHow to add date column in python pandas dataframe
$begingroup$
So I have a column called "plot" in a dataframe and i want to create a new one called "keywords" which only has the important words of plot.
here is the code:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
df = pd.read_csv('IMDB_Top250Engmovies2_OMDB_Detailed.csv')
df = df[['Title','Genre','Director','Actors','Plot']]
df['Keywords'] = ''
for index,row in df.iterrows():
plot = row['Plot']
plot = re.sub('[^a-zA-Z]'," ", plot)
plot = plot.lower()
plot = plot.split()
plot = [i for i in plot if not i in set(stopwords.words('english'))]
plot = ' '.join(plot)
row['Key_words'] = str(plot)
And here is the output :(
Link to the csv : https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7
Thank you !
pandas
New contributor
$endgroup$
add a comment |
$begingroup$
So I have a column called "plot" in a dataframe and i want to create a new one called "keywords" which only has the important words of plot.
here is the code:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
df = pd.read_csv('IMDB_Top250Engmovies2_OMDB_Detailed.csv')
df = df[['Title','Genre','Director','Actors','Plot']]
df['Keywords'] = ''
for index,row in df.iterrows():
plot = row['Plot']
plot = re.sub('[^a-zA-Z]'," ", plot)
plot = plot.lower()
plot = plot.split()
plot = [i for i in plot if not i in set(stopwords.words('english'))]
plot = ' '.join(plot)
row['Key_words'] = str(plot)
And here is the output :(
Link to the csv : https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7
Thank you !
pandas
New contributor
$endgroup$
1
$begingroup$
Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
$endgroup$
– Tasos
Apr 2 at 8:33
$begingroup$
Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
$endgroup$
– S van Balen
Apr 2 at 10:05
add a comment |
$begingroup$
So I have a column called "plot" in a dataframe and i want to create a new one called "keywords" which only has the important words of plot.
here is the code:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
df = pd.read_csv('IMDB_Top250Engmovies2_OMDB_Detailed.csv')
df = df[['Title','Genre','Director','Actors','Plot']]
df['Keywords'] = ''
for index,row in df.iterrows():
plot = row['Plot']
plot = re.sub('[^a-zA-Z]'," ", plot)
plot = plot.lower()
plot = plot.split()
plot = [i for i in plot if not i in set(stopwords.words('english'))]
plot = ' '.join(plot)
row['Key_words'] = str(plot)
And here is the output :(
Link to the csv : https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7
Thank you !
pandas
New contributor
$endgroup$
So I have a column called "plot" in a dataframe and i want to create a new one called "keywords" which only has the important words of plot.
here is the code:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
df = pd.read_csv('IMDB_Top250Engmovies2_OMDB_Detailed.csv')
df = df[['Title','Genre','Director','Actors','Plot']]
df['Keywords'] = ''
for index,row in df.iterrows():
plot = row['Plot']
plot = re.sub('[^a-zA-Z]'," ", plot)
plot = plot.lower()
plot = plot.split()
plot = [i for i in plot if not i in set(stopwords.words('english'))]
plot = ' '.join(plot)
row['Key_words'] = str(plot)
And here is the output :(
Link to the csv : https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7
Thank you !
pandas
pandas
New contributor
New contributor
edited Apr 2 at 8:47
Abhinav Thapper
New contributor
asked Apr 2 at 8:29
Abhinav ThapperAbhinav Thapper
11
11
New contributor
New contributor
1
$begingroup$
Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
$endgroup$
– Tasos
Apr 2 at 8:33
$begingroup$
Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
$endgroup$
– S van Balen
Apr 2 at 10:05
add a comment |
1
$begingroup$
Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
$endgroup$
– Tasos
Apr 2 at 8:33
$begingroup$
Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
$endgroup$
– S van Balen
Apr 2 at 10:05
1
1
$begingroup$
Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
$endgroup$
– Tasos
Apr 2 at 8:33
$begingroup$
Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
$endgroup$
– Tasos
Apr 2 at 8:33
$begingroup$
Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
$endgroup$
– S van Balen
Apr 2 at 10:05
$begingroup$
Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
$endgroup$
– S van Balen
Apr 2 at 10:05
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
It could be something like this
create function here:
def important_words(plot):
# your code here
return plot
make use of apply
function:
df["Keywords"] = df.Plot.apply(lambda x: important_words(x))
$endgroup$
add a comment |
$begingroup$
Iterrow passes a copy of the row, not the reference. This should fix your problem:
df.loc[index,'Keywords'] = str(plot)
However, I would recommend using apply, imho it is more elegant. And it is alot faster.
That would looks something like this
def string_to_keywords(string):
plot = re.sub('[^a-zA-Z]'," ", string)
plot = plot.lower()
plot = plot.split()
return " ".join([i for i in plot if not i in set(stopwords.words('english'))])
df["Keywords"] = df["Plot"].apply(string_to_keywords)
$endgroup$
$begingroup$
BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
$endgroup$
– S van Balen
Apr 2 at 10:08
$begingroup$
Ummm how to use appy here ?
$endgroup$
– Abhinav Thapper
Apr 2 at 10:59
$begingroup$
I added that to the answer Abhinav.
$endgroup$
– S van Balen
Apr 2 at 12:07
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48420%2fadding-a-column-to-a-dataframe-in-pandas-using-another-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It could be something like this
create function here:
def important_words(plot):
# your code here
return plot
make use of apply
function:
df["Keywords"] = df.Plot.apply(lambda x: important_words(x))
$endgroup$
add a comment |
$begingroup$
It could be something like this
create function here:
def important_words(plot):
# your code here
return plot
make use of apply
function:
df["Keywords"] = df.Plot.apply(lambda x: important_words(x))
$endgroup$
add a comment |
$begingroup$
It could be something like this
create function here:
def important_words(plot):
# your code here
return plot
make use of apply
function:
df["Keywords"] = df.Plot.apply(lambda x: important_words(x))
$endgroup$
It could be something like this
create function here:
def important_words(plot):
# your code here
return plot
make use of apply
function:
df["Keywords"] = df.Plot.apply(lambda x: important_words(x))
answered Apr 2 at 11:25
bakkabakka
25816
25816
add a comment |
add a comment |
$begingroup$
Iterrow passes a copy of the row, not the reference. This should fix your problem:
df.loc[index,'Keywords'] = str(plot)
However, I would recommend using apply, imho it is more elegant. And it is alot faster.
That would looks something like this
def string_to_keywords(string):
plot = re.sub('[^a-zA-Z]'," ", string)
plot = plot.lower()
plot = plot.split()
return " ".join([i for i in plot if not i in set(stopwords.words('english'))])
df["Keywords"] = df["Plot"].apply(string_to_keywords)
$endgroup$
$begingroup$
BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
$endgroup$
– S van Balen
Apr 2 at 10:08
$begingroup$
Ummm how to use appy here ?
$endgroup$
– Abhinav Thapper
Apr 2 at 10:59
$begingroup$
I added that to the answer Abhinav.
$endgroup$
– S van Balen
Apr 2 at 12:07
add a comment |
$begingroup$
Iterrow passes a copy of the row, not the reference. This should fix your problem:
df.loc[index,'Keywords'] = str(plot)
However, I would recommend using apply, imho it is more elegant. And it is alot faster.
That would looks something like this
def string_to_keywords(string):
plot = re.sub('[^a-zA-Z]'," ", string)
plot = plot.lower()
plot = plot.split()
return " ".join([i for i in plot if not i in set(stopwords.words('english'))])
df["Keywords"] = df["Plot"].apply(string_to_keywords)
$endgroup$
$begingroup$
BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
$endgroup$
– S van Balen
Apr 2 at 10:08
$begingroup$
Ummm how to use appy here ?
$endgroup$
– Abhinav Thapper
Apr 2 at 10:59
$begingroup$
I added that to the answer Abhinav.
$endgroup$
– S van Balen
Apr 2 at 12:07
add a comment |
$begingroup$
Iterrow passes a copy of the row, not the reference. This should fix your problem:
df.loc[index,'Keywords'] = str(plot)
However, I would recommend using apply, imho it is more elegant. And it is alot faster.
That would looks something like this
def string_to_keywords(string):
plot = re.sub('[^a-zA-Z]'," ", string)
plot = plot.lower()
plot = plot.split()
return " ".join([i for i in plot if not i in set(stopwords.words('english'))])
df["Keywords"] = df["Plot"].apply(string_to_keywords)
$endgroup$
Iterrow passes a copy of the row, not the reference. This should fix your problem:
df.loc[index,'Keywords'] = str(plot)
However, I would recommend using apply, imho it is more elegant. And it is alot faster.
That would looks something like this
def string_to_keywords(string):
plot = re.sub('[^a-zA-Z]'," ", string)
plot = plot.lower()
plot = plot.split()
return " ".join([i for i in plot if not i in set(stopwords.words('english'))])
df["Keywords"] = df["Plot"].apply(string_to_keywords)
edited Apr 2 at 12:07
answered Apr 2 at 10:02
S van BalenS van Balen
616317
616317
$begingroup$
BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
$endgroup$
– S van Balen
Apr 2 at 10:08
$begingroup$
Ummm how to use appy here ?
$endgroup$
– Abhinav Thapper
Apr 2 at 10:59
$begingroup$
I added that to the answer Abhinav.
$endgroup$
– S van Balen
Apr 2 at 12:07
add a comment |
$begingroup$
BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
$endgroup$
– S van Balen
Apr 2 at 10:08
$begingroup$
Ummm how to use appy here ?
$endgroup$
– Abhinav Thapper
Apr 2 at 10:59
$begingroup$
I added that to the answer Abhinav.
$endgroup$
– S van Balen
Apr 2 at 12:07
$begingroup$
BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
$endgroup$
– S van Balen
Apr 2 at 10:08
$begingroup$
BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
$endgroup$
– S van Balen
Apr 2 at 10:08
$begingroup$
Ummm how to use appy here ?
$endgroup$
– Abhinav Thapper
Apr 2 at 10:59
$begingroup$
Ummm how to use appy here ?
$endgroup$
– Abhinav Thapper
Apr 2 at 10:59
$begingroup$
I added that to the answer Abhinav.
$endgroup$
– S van Balen
Apr 2 at 12:07
$begingroup$
I added that to the answer Abhinav.
$endgroup$
– S van Balen
Apr 2 at 12:07
add a comment |
Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.
Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.
Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.
Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48420%2fadding-a-column-to-a-dataframe-in-pandas-using-another-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
$endgroup$
– Tasos
Apr 2 at 8:33
$begingroup$
Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
$endgroup$
– S van Balen
Apr 2 at 10:05