Adding a column to a dataframe in pandas using another Column2019 Community Moderator ElectionPandas Dataframe to DMatrixPandas column count mismatch after insert into dataframeRemove Local Outliers from Dataframe using pandasResampling pandas Dataframe keeping other columnsMapping column values of one DataFrame to another DataFrame using a key with different header nameshow to make new columns using another column in pandas?Split datetime64 column in dataframe on yearPandas DataFrame Rollup ErrorMultiple filtering pandas columns based on values in another columnHow to add date column in python pandas dataframe

Is it legal for company to use my work email to pretend I still work there?

Is it unprofessional to ask if a job posting on GlassDoor is real?

How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?

Watching something be written to a file live with tail

Are astronomers waiting to see something in an image from a gravitational lens that they've already seen in an adjacent image?

Do infinite dimensional systems make sense?

Java Casting: Java 11 throws LambdaConversionException while 1.8 does not

Why is Minecraft giving an OpenGL error?

Decision tree nodes overlapping with Tikz

Arrow those variables!

Operational amplifier as comparator at high frequency

Could an aircraft fly or hover using only jets of compressed air?

Do I have a twin with permutated remainders?

Can a Cauchy sequence converge for one metric while not converging for another?

How to regain access to running applications after accidentally zapping X.org?

dbcc cleantable batch size explanation

A case of the sniffles

Why doesn't a class having private constructor prevent inheriting from this class? How to control which classes can inherit from a certain base?

Client team has low performances and low technical skills: we always fix their work and now they stop collaborate with us. How to solve?

Which country benefited the most from UN Security Council vetoes?

Theorems that impeded progress

What is a clear way to write a bar that has an extra beat?

When a company launches a new product do they "come out" with a new product or do they "come up" with a new product?

Can a monk's single staff be considered dual wielded, as per the Dual Wielder feat?



Adding a column to a dataframe in pandas using another Column



2019 Community Moderator ElectionPandas Dataframe to DMatrixPandas column count mismatch after insert into dataframeRemove Local Outliers from Dataframe using pandasResampling pandas Dataframe keeping other columnsMapping column values of one DataFrame to another DataFrame using a key with different header nameshow to make new columns using another column in pandas?Split datetime64 column in dataframe on yearPandas DataFrame Rollup ErrorMultiple filtering pandas columns based on values in another columnHow to add date column in python pandas dataframe










0












$begingroup$


So I have a column called "plot" in a dataframe and i want to create a new one called "keywords" which only has the important words of plot.
here is the code:



 import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
df = pd.read_csv('IMDB_Top250Engmovies2_OMDB_Detailed.csv')
df = df[['Title','Genre','Director','Actors','Plot']]
df['Keywords'] = ''

for index,row in df.iterrows():
plot = row['Plot']
plot = re.sub('[^a-zA-Z]'," ", plot)
plot = plot.lower()
plot = plot.split()
plot = [i for i in plot if not i in set(stopwords.words('english'))]
plot = ' '.join(plot)
row['Key_words'] = str(plot)


And here is the output :(



enter image description here



Link to the csv : https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7



Thank you !










share|improve this question









New contributor




Abhinav Thapper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$







  • 1




    $begingroup$
    Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
    $endgroup$
    – Tasos
    Apr 2 at 8:33










  • $begingroup$
    Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
    $endgroup$
    – S van Balen
    Apr 2 at 10:05















0












$begingroup$


So I have a column called "plot" in a dataframe and i want to create a new one called "keywords" which only has the important words of plot.
here is the code:



 import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
df = pd.read_csv('IMDB_Top250Engmovies2_OMDB_Detailed.csv')
df = df[['Title','Genre','Director','Actors','Plot']]
df['Keywords'] = ''

for index,row in df.iterrows():
plot = row['Plot']
plot = re.sub('[^a-zA-Z]'," ", plot)
plot = plot.lower()
plot = plot.split()
plot = [i for i in plot if not i in set(stopwords.words('english'))]
plot = ' '.join(plot)
row['Key_words'] = str(plot)


And here is the output :(



enter image description here



Link to the csv : https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7



Thank you !










share|improve this question









New contributor




Abhinav Thapper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$







  • 1




    $begingroup$
    Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
    $endgroup$
    – Tasos
    Apr 2 at 8:33










  • $begingroup$
    Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
    $endgroup$
    – S van Balen
    Apr 2 at 10:05













0












0








0





$begingroup$


So I have a column called "plot" in a dataframe and i want to create a new one called "keywords" which only has the important words of plot.
here is the code:



 import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
df = pd.read_csv('IMDB_Top250Engmovies2_OMDB_Detailed.csv')
df = df[['Title','Genre','Director','Actors','Plot']]
df['Keywords'] = ''

for index,row in df.iterrows():
plot = row['Plot']
plot = re.sub('[^a-zA-Z]'," ", plot)
plot = plot.lower()
plot = plot.split()
plot = [i for i in plot if not i in set(stopwords.words('english'))]
plot = ' '.join(plot)
row['Key_words'] = str(plot)


And here is the output :(



enter image description here



Link to the csv : https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7



Thank you !










share|improve this question









New contributor




Abhinav Thapper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




So I have a column called "plot" in a dataframe and i want to create a new one called "keywords" which only has the important words of plot.
here is the code:



 import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
df = pd.read_csv('IMDB_Top250Engmovies2_OMDB_Detailed.csv')
df = df[['Title','Genre','Director','Actors','Plot']]
df['Keywords'] = ''

for index,row in df.iterrows():
plot = row['Plot']
plot = re.sub('[^a-zA-Z]'," ", plot)
plot = plot.lower()
plot = plot.split()
plot = [i for i in plot if not i in set(stopwords.words('english'))]
plot = ' '.join(plot)
row['Key_words'] = str(plot)


And here is the output :(



enter image description here



Link to the csv : https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7



Thank you !







pandas






share|improve this question









New contributor




Abhinav Thapper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Abhinav Thapper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited Apr 2 at 8:47







Abhinav Thapper













New contributor




Abhinav Thapper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Apr 2 at 8:29









Abhinav ThapperAbhinav Thapper

11




11




New contributor




Abhinav Thapper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Abhinav Thapper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Abhinav Thapper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 1




    $begingroup$
    Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
    $endgroup$
    – Tasos
    Apr 2 at 8:33










  • $begingroup$
    Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
    $endgroup$
    – S van Balen
    Apr 2 at 10:05












  • 1




    $begingroup$
    Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
    $endgroup$
    – Tasos
    Apr 2 at 8:33










  • $begingroup$
    Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
    $endgroup$
    – S van Balen
    Apr 2 at 10:05







1




1




$begingroup$
Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
$endgroup$
– Tasos
Apr 2 at 8:33




$begingroup$
Please, please. Avoid to include code in images. It wouldn't be possible for anyone to help you if he/she cannot copy paste your code to run it locally. You can edit your question and format the post with the original code.
$endgroup$
– Tasos
Apr 2 at 8:33












$begingroup$
Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
$endgroup$
– S van Balen
Apr 2 at 10:05




$begingroup$
Also: It's recommended that you clearly state what doesn't work. I kinda pieced together that you'd like your column to contain things. Welcome btw.
$endgroup$
– S van Balen
Apr 2 at 10:05










2 Answers
2






active

oldest

votes


















0












$begingroup$

It could be something like this



create function here:



def important_words(plot):
# your code here
return plot


make use of apply function:



df["Keywords"] = df.Plot.apply(lambda x: important_words(x))





share|improve this answer









$endgroup$




















    0












    $begingroup$

    Iterrow passes a copy of the row, not the reference. This should fix your problem:



    df.loc[index,'Keywords'] = str(plot)


    However, I would recommend using apply, imho it is more elegant. And it is alot faster.



    That would looks something like this



    def string_to_keywords(string):
    plot = re.sub('[^a-zA-Z]'," ", string)
    plot = plot.lower()
    plot = plot.split()
    return " ".join([i for i in plot if not i in set(stopwords.words('english'))])

    df["Keywords"] = df["Plot"].apply(string_to_keywords)





    share|improve this answer











    $endgroup$












    • $begingroup$
      BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
      $endgroup$
      – S van Balen
      Apr 2 at 10:08










    • $begingroup$
      Ummm how to use appy here ?
      $endgroup$
      – Abhinav Thapper
      Apr 2 at 10:59










    • $begingroup$
      I added that to the answer Abhinav.
      $endgroup$
      – S van Balen
      Apr 2 at 12:07











    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48420%2fadding-a-column-to-a-dataframe-in-pandas-using-another-column%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    It could be something like this



    create function here:



    def important_words(plot):
    # your code here
    return plot


    make use of apply function:



    df["Keywords"] = df.Plot.apply(lambda x: important_words(x))





    share|improve this answer









    $endgroup$

















      0












      $begingroup$

      It could be something like this



      create function here:



      def important_words(plot):
      # your code here
      return plot


      make use of apply function:



      df["Keywords"] = df.Plot.apply(lambda x: important_words(x))





      share|improve this answer









      $endgroup$















        0












        0








        0





        $begingroup$

        It could be something like this



        create function here:



        def important_words(plot):
        # your code here
        return plot


        make use of apply function:



        df["Keywords"] = df.Plot.apply(lambda x: important_words(x))





        share|improve this answer









        $endgroup$



        It could be something like this



        create function here:



        def important_words(plot):
        # your code here
        return plot


        make use of apply function:



        df["Keywords"] = df.Plot.apply(lambda x: important_words(x))






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Apr 2 at 11:25









        bakkabakka

        25816




        25816





















            0












            $begingroup$

            Iterrow passes a copy of the row, not the reference. This should fix your problem:



            df.loc[index,'Keywords'] = str(plot)


            However, I would recommend using apply, imho it is more elegant. And it is alot faster.



            That would looks something like this



            def string_to_keywords(string):
            plot = re.sub('[^a-zA-Z]'," ", string)
            plot = plot.lower()
            plot = plot.split()
            return " ".join([i for i in plot if not i in set(stopwords.words('english'))])

            df["Keywords"] = df["Plot"].apply(string_to_keywords)





            share|improve this answer











            $endgroup$












            • $begingroup$
              BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
              $endgroup$
              – S van Balen
              Apr 2 at 10:08










            • $begingroup$
              Ummm how to use appy here ?
              $endgroup$
              – Abhinav Thapper
              Apr 2 at 10:59










            • $begingroup$
              I added that to the answer Abhinav.
              $endgroup$
              – S van Balen
              Apr 2 at 12:07















            0












            $begingroup$

            Iterrow passes a copy of the row, not the reference. This should fix your problem:



            df.loc[index,'Keywords'] = str(plot)


            However, I would recommend using apply, imho it is more elegant. And it is alot faster.



            That would looks something like this



            def string_to_keywords(string):
            plot = re.sub('[^a-zA-Z]'," ", string)
            plot = plot.lower()
            plot = plot.split()
            return " ".join([i for i in plot if not i in set(stopwords.words('english'))])

            df["Keywords"] = df["Plot"].apply(string_to_keywords)





            share|improve this answer











            $endgroup$












            • $begingroup$
              BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
              $endgroup$
              – S van Balen
              Apr 2 at 10:08










            • $begingroup$
              Ummm how to use appy here ?
              $endgroup$
              – Abhinav Thapper
              Apr 2 at 10:59










            • $begingroup$
              I added that to the answer Abhinav.
              $endgroup$
              – S van Balen
              Apr 2 at 12:07













            0












            0








            0





            $begingroup$

            Iterrow passes a copy of the row, not the reference. This should fix your problem:



            df.loc[index,'Keywords'] = str(plot)


            However, I would recommend using apply, imho it is more elegant. And it is alot faster.



            That would looks something like this



            def string_to_keywords(string):
            plot = re.sub('[^a-zA-Z]'," ", string)
            plot = plot.lower()
            plot = plot.split()
            return " ".join([i for i in plot if not i in set(stopwords.words('english'))])

            df["Keywords"] = df["Plot"].apply(string_to_keywords)





            share|improve this answer











            $endgroup$



            Iterrow passes a copy of the row, not the reference. This should fix your problem:



            df.loc[index,'Keywords'] = str(plot)


            However, I would recommend using apply, imho it is more elegant. And it is alot faster.



            That would looks something like this



            def string_to_keywords(string):
            plot = re.sub('[^a-zA-Z]'," ", string)
            plot = plot.lower()
            plot = plot.split()
            return " ".join([i for i in plot if not i in set(stopwords.words('english'))])

            df["Keywords"] = df["Plot"].apply(string_to_keywords)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 2 at 12:07

























            answered Apr 2 at 10:02









            S van BalenS van Balen

            616317




            616317











            • $begingroup$
              BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
              $endgroup$
              – S van Balen
              Apr 2 at 10:08










            • $begingroup$
              Ummm how to use appy here ?
              $endgroup$
              – Abhinav Thapper
              Apr 2 at 10:59










            • $begingroup$
              I added that to the answer Abhinav.
              $endgroup$
              – S van Balen
              Apr 2 at 12:07
















            • $begingroup$
              BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
              $endgroup$
              – S van Balen
              Apr 2 at 10:08










            • $begingroup$
              Ummm how to use appy here ?
              $endgroup$
              – Abhinav Thapper
              Apr 2 at 10:59










            • $begingroup$
              I added that to the answer Abhinav.
              $endgroup$
              – S van Balen
              Apr 2 at 12:07















            $begingroup$
            BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
            $endgroup$
            – S van Balen
            Apr 2 at 10:08




            $begingroup$
            BTW: Extracting everything but the stopwords is a nice starting point. Once you get this to work, you might want to look into tfidf or attention, if you want to get sophisticated about it.
            $endgroup$
            – S van Balen
            Apr 2 at 10:08












            $begingroup$
            Ummm how to use appy here ?
            $endgroup$
            – Abhinav Thapper
            Apr 2 at 10:59




            $begingroup$
            Ummm how to use appy here ?
            $endgroup$
            – Abhinav Thapper
            Apr 2 at 10:59












            $begingroup$
            I added that to the answer Abhinav.
            $endgroup$
            – S van Balen
            Apr 2 at 12:07




            $begingroup$
            I added that to the answer Abhinav.
            $endgroup$
            – S van Balen
            Apr 2 at 12:07










            Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.












            Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.











            Abhinav Thapper is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48420%2fadding-a-column-to-a-dataframe-in-pandas-using-another-column%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

            Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

            Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High