Trjtdtk

Question

I have fimiliarized myself with the recommended most important concepts (Linear Algebra, Analysis, Phython, Numpy, Pandas, a bit of Statistics, Linear regression). For the last two, I don't know how deep it should go. I know what things mean and how to get them working in python.

But the question is what now? I guess I could argue that this is a starting point and I can apply to a bad data analysis or visualisation position if I learn tableau and present myself well. But what would I do to even prove what I can do before an interview? Putting a notebook on github where I imported a dataset, cleaned it a bit, did a .desribe(), .plot() and a linear regression isn't very impressive nor interesting to anyone. So what would I do instead?

Also, this clearly isn't data science area yet. If I look at kaggle challenges, I either don't know what to do or think to myself "Clean data, LinRegression". So what should I take a look at next?

Note that I'm taking classes, but not in Data Science but in Chemistry right now.

score 4 · Accepted Answer · 2019-03-25 19:10:44Z

So you're still on the Basics and William's answer is pretty good, I will list here a bit of stuff to learn, and where to.

1 - You need the basics, that is already much more than you expected it to be:

Linear Algebra: knowing the best way of inverting a matrix might be useful for a computer scientist, but you're not aiming for that. You need to understand concepts and their meaning and effects such as:
- Matrix Rank (For example, this could tell you, by an Autocorrelation matrix, that your data is still not enough for things like least squares.)
- Meaning of Vector Spaces and basic linear transformations such as base change
- Meaning of eigenvalues and eigenvectors

Calculus: also, focus on the meaning and understanding, computers can do most of the operations, even analytically
- Derivatives and Integrals
- Optimization

Signals and Systems: that might be a little bit biased (since I am a signal processing researcher) of me but, learning how to model certain phenomena and how they behave may help you to solve problems, that's basically applied Linear Algebra and Calculus (most things are). Aim for really basic, Signal processing is one of the research areas most affected by Data Science/Machine Learning, to the point people were making surveys to change the name of the community in IEEE.

The above can be found on many, many books and intros and won't be difficult for you to search for them.

Statistics: Machine learning derives from statistics and this is essential. Actually, you can learn calculus from a statistic point of view instead of function optimization. Links in the subjects names are free courses to Udacity or Udemy
- You will need to learn Descriptive Statistics, which will allow you to understand data.
- And learning Inferential Statistics.

It is important to learn what you can do with classic statistics to avoid wasting computers in problems that could be solved easily. It is a good practice to model things in the simplest way possible and escalate to more complex model if needed.

Machine Learning: Machine learning models are your everyday tools as a Data Scientist:
- You can start with a crash course on Google Developers and Intro to Machine Learning in Udacity.
- Then, you can go deeper... NO! Not Deep Learning, deeper into Machine Learning, also on Udacity.
- But you might want to take a more formal course such as Stanfords at some point, just to fill some voids.

Machine Learning Tasklist: You stopped at linear regression? So here are the most popular models waiting for you:
- Linear and Logistic Regression: pretty basic
- Decision Tree: basic but highly interpretable
- Support Vector Machines: simple but powerful (and I love Kernel methods, shoot me for that)
- Naive Bayes
- k Nearest Neighbors
- K-Means Clustering
- Random Forests and Gradient Boosting (Ensembles): these are really powerful and might be interpretable if you don't let them grow without care

Also, you might need some dimensionality reduction tools such as:

Principal Component Analysis

Linear Discriminant Analysis

Then you can go for more complex Neural Networks:

Self-organizing-maps, Feed Forward, RNNs, CNN's and so on...

Note: CNN's are usual in Computer Vision applications but the only thing it asks of data is that it has been organized in a way that allows for meaningful correlations between data that is near each other. Example: a process with multiple sensors in a time series might benefit from a CNN.

Data Analysis: a Data Scientist must have a personal relationship with data. For any good relation, you will need to understand your loved one (but the relationship with data is usually toxic, hahaha). Udacity has a nice Intro to Data Analysis, also for free.

Learn to express yourself:
- make a youtube channel, present small tutorials, and classes to the community.
- Try to answer Stack Exchange questions and help others, that will build your respect in the community and goodwill for when it is your time to ask. Also, this is a good way to practice expressing your ideas in a text.
- Write a blog, it is a good way to have a notebook and also gaining attention from the community.

You can check on hands-on books such as Data Smart and Data Science from Scratch. Data Smart is about getting Insight from information and that's mostly your job as a D-Scientist.

2 - Build Respect

Try creating packages and libraries and making them available on GitHub and sharing your relevant solutions.

Win Kaggle competitions, many companies take Kaggle seriously as .... and having a good score will get you nice positions. You don't need to win in first place by the way. Also, competitions are usually good examples of real-world problems that will get you that so needed experience that you can't get while you don't have a Data Scientist role.

Also, some competitions pay really well.

Explore Kaggle, share algorithms, read and try to improve on others and search for datasets that might be of your interest.

Making datasets is a bit exhaustive but might be a way of making money while you're not ready.

3 - Get programming skills

Not only learning frameworks but understand how things work and basics of problem-solving must be put to the test every day. Also, making everything from scratch is fun and is good for learning, but when you do your work you will need code with high maturity, checked hundreds of times all over the world.

You'll need some tools, Python is a great language for data science (since the community is active and it is free, Matlab has a lot of nice tools and marvelous documentation but it is really expensive and quite a bit slow)

Some top libraries:
- NumPy is the most fundamental package, understand it well
- Pandas for wrangling with data
- Seaborn, Bokeh, Plotly, and Matplotlib for plotting stuff and helping you making good reports
- SciKit-Learn this is usually the fastest way to test a machine learning algorithm
- Theano is similar to NumPy but was constructed with Machine Learning in mind
- Keras, this is a library for building Neural Networks really fast, it uses Theano or TensorFlow as backends
- TensorFlow, PyTorch, and other deep learning related stuff.

Also, you might want to get some knowledge of JavaScript and libraries for acquiring data on the web.

4 - Go Deeper

You may never need deep learning, depending on the area you're going to apply but this is that good nuclear weapon you hope to never have to use but someday you may:
- Understand start with Udemy
- There is a nanodegree on Udacity

Also, remember DL is computationally intensive and you want to avoid needing it (since these are expensive)

5 - Finally: Career

Learning never stops and you will never stop learning new concepts, every single day.

Courses are long, take them at your own pace. Try getting the basics of learning how to use, then go back and learn it for real

Try to get a few certificates and posting them on your LinkedIn. Make a few projects and create online articles on your blog and on LinkedIn and Kaggle.

Try choosing something you can relate to while searching for work. DS covers a wide range of subjects and trying to get insight from things you understand is easier than trying to get insight from things that sound like random noise for you

Build a network of collaborators, help your colleagues and try building a vast network ranging from medicine to linguistics, they might tell you what you're doing wrong while looking at data not related to your field of expertise.

Finally, and this took a bit longer than I anticipated but: don't give up. This is a long journey but is absurdly rewarding, both financially and personally. And try not working alone, create a small group of people to work with and make some projects.

This is my longest answer to a Stack Exchange question.

William ScottWilliam Scott 1063 · Accepted Answer · 2019-03-23 20:02:38Z

I can see that you are interested in Data Science, without knowing what lies ahead. Nothing wrong here! your interest is all that matters.

Its interesting that you have taught yourself those libraries and Linear Regression.

But those are not totally enough. The libraries are just a way to handle data and Linear Regression is very very basic. And not so popular model in real world scenarios.

Basic Models

Linear Regression

Logistic Regression

SVM

Neural Networks

Decision Trees, Random Forests.

As for linear and logistic regression, i suggest you to implement them from scratch, this i believe is the best way to learn the concepts in depth.

I suggest you to start with Andrew NG videos on Machine Learning (link below)

https://www.youtube.com/playlist?list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN

Try to implement those as the exercises come up. Much better you can take up his Machine Learning course in coursera. This will clear up your basics. (link below)

https://www.coursera.org/learn/machine-learning

Additionally, i have created a repository for starters like you during my learning process. feel free to check it out (link below)

https://github.com/williamscott701/Machine-Learning

Hope this helps ;)

Keep the fire on.

Thank you very much! In the end, if I know those things, what do I do before applying for a job? Do I do kaggle challenges and post my approaches in notebooks on Github? Or is there something else I should do? — Mar 23 at 20:36
Hey! there can never be an end to learning, and as i mentioned these are just to get you started. But if you are mainly focusing on the jobs side, then kaggle challenges are a good way to apply what you have learned. Also find some interview questions which can are fairly standard (you can get these by googling). Vote me if i was able to help ;) — Mar 23 at 20:48

score 4 · Accepted Answer · 2019-03-25 19:10:44Z

So you're still on the Basics and William's answer is pretty good, I will list here a bit of stuff to learn, and where to.

1 - You need the basics, that is already much more than you expected it to be:

Linear Algebra: knowing the best way of inverting a matrix might be useful for a computer scientist, but you're not aiming for that. You need to understand concepts and their meaning and effects such as:
- Matrix Rank (For example, this could tell you, by an Autocorrelation matrix, that your data is still not enough for things like least squares.)
- Meaning of Vector Spaces and basic linear transformations such as base change
- Meaning of eigenvalues and eigenvectors

Calculus: also, focus on the meaning and understanding, computers can do most of the operations, even analytically
- Derivatives and Integrals
- Optimization

Signals and Systems: that might be a little bit biased (since I am a signal processing researcher) of me but, learning how to model certain phenomena and how they behave may help you to solve problems, that's basically applied Linear Algebra and Calculus (most things are). Aim for really basic, Signal processing is one of the research areas most affected by Data Science/Machine Learning, to the point people were making surveys to change the name of the community in IEEE.

The above can be found on many, many books and intros and won't be difficult for you to search for them.

Statistics: Machine learning derives from statistics and this is essential. Actually, you can learn calculus from a statistic point of view instead of function optimization. Links in the subjects names are free courses to Udacity or Udemy
- You will need to learn Descriptive Statistics, which will allow you to understand data.
- And learning Inferential Statistics.

It is important to learn what you can do with classic statistics to avoid wasting computers in problems that could be solved easily. It is a good practice to model things in the simplest way possible and escalate to more complex model if needed.

Machine Learning: Machine learning models are your everyday tools as a Data Scientist:
- You can start with a crash course on Google Developers and Intro to Machine Learning in Udacity.
- Then, you can go deeper... NO! Not Deep Learning, deeper into Machine Learning, also on Udacity.
- But you might want to take a more formal course such as Stanfords at some point, just to fill some voids.

Machine Learning Tasklist: You stopped at linear regression? So here are the most popular models waiting for you:
- Linear and Logistic Regression: pretty basic
- Decision Tree: basic but highly interpretable
- Support Vector Machines: simple but powerful (and I love Kernel methods, shoot me for that)
- Naive Bayes
- k Nearest Neighbors
- K-Means Clustering
- Random Forests and Gradient Boosting (Ensembles): these are really powerful and might be interpretable if you don't let them grow without care

Also, you might need some dimensionality reduction tools such as:

Principal Component Analysis

Linear Discriminant Analysis

Then you can go for more complex Neural Networks:

Self-organizing-maps, Feed Forward, RNNs, CNN's and so on...

Note: CNN's are usual in Computer Vision applications but the only thing it asks of data is that it has been organized in a way that allows for meaningful correlations between data that is near each other. Example: a process with multiple sensors in a time series might benefit from a CNN.

Data Analysis: a Data Scientist must have a personal relationship with data. For any good relation, you will need to understand your loved one (but the relationship with data is usually toxic, hahaha). Udacity has a nice Intro to Data Analysis, also for free.

Learn to express yourself:
- make a youtube channel, present small tutorials, and classes to the community.
- Try to answer Stack Exchange questions and help others, that will build your respect in the community and goodwill for when it is your time to ask. Also, this is a good way to practice expressing your ideas in a text.
- Write a blog, it is a good way to have a notebook and also gaining attention from the community.

You can check on hands-on books such as Data Smart and Data Science from Scratch. Data Smart is about getting Insight from information and that's mostly your job as a D-Scientist.

2 - Build Respect

Try creating packages and libraries and making them available on GitHub and sharing your relevant solutions.

Win Kaggle competitions, many companies take Kaggle seriously as .... and having a good score will get you nice positions. You don't need to win in first place by the way. Also, competitions are usually good examples of real-world problems that will get you that so needed experience that you can't get while you don't have a Data Scientist role.

Also, some competitions pay really well.

Explore Kaggle, share algorithms, read and try to improve on others and search for datasets that might be of your interest.

Making datasets is a bit exhaustive but might be a way of making money while you're not ready.

3 - Get programming skills

Not only learning frameworks but understand how things work and basics of problem-solving must be put to the test every day. Also, making everything from scratch is fun and is good for learning, but when you do your work you will need code with high maturity, checked hundreds of times all over the world.

You'll need some tools, Python is a great language for data science (since the community is active and it is free, Matlab has a lot of nice tools and marvelous documentation but it is really expensive and quite a bit slow)

Some top libraries:
- NumPy is the most fundamental package, understand it well
- Pandas for wrangling with data
- Seaborn, Bokeh, Plotly, and Matplotlib for plotting stuff and helping you making good reports
- SciKit-Learn this is usually the fastest way to test a machine learning algorithm
- Theano is similar to NumPy but was constructed with Machine Learning in mind
- Keras, this is a library for building Neural Networks really fast, it uses Theano or TensorFlow as backends
- TensorFlow, PyTorch, and other deep learning related stuff.

Also, you might want to get some knowledge of JavaScript and libraries for acquiring data on the web.

4 - Go Deeper

You may never need deep learning, depending on the area you're going to apply but this is that good nuclear weapon you hope to never have to use but someday you may:
- Understand start with Udemy
- There is a nanodegree on Udacity

Also, remember DL is computationally intensive and you want to avoid needing it (since these are expensive)

5 - Finally: Career

Learning never stops and you will never stop learning new concepts, every single day.

Courses are long, take them at your own pace. Try getting the basics of learning how to use, then go back and learn it for real

Try to get a few certificates and posting them on your LinkedIn. Make a few projects and create online articles on your blog and on LinkedIn and Kaggle.

Try choosing something you can relate to while searching for work. DS covers a wide range of subjects and trying to get insight from things you understand is easier than trying to get insight from things that sound like random noise for you

Build a network of collaborators, help your colleagues and try building a vast network ranging from medicine to linguistics, they might tell you what you're doing wrong while looking at data not related to your field of expertise.

Finally, and this took a bit longer than I anticipated but: don't give up. This is a long journey but is absurdly rewarding, both financially and personally. And try not working alone, create a small group of people to work with and make some projects.

This is my longest answer to a Stack Exchange question.

William ScottWilliam Scott 1063 · Accepted Answer · 2019-03-23 20:02:38Z

I can see that you are interested in Data Science, without knowing what lies ahead. Nothing wrong here! your interest is all that matters.

Its interesting that you have taught yourself those libraries and Linear Regression.

But those are not totally enough. The libraries are just a way to handle data and Linear Regression is very very basic. And not so popular model in real world scenarios.

Basic Models

Linear Regression

Logistic Regression

SVM

Neural Networks

Decision Trees, Random Forests.

As for linear and logistic regression, i suggest you to implement them from scratch, this i believe is the best way to learn the concepts in depth.

I suggest you to start with Andrew NG videos on Machine Learning (link below)

https://www.youtube.com/playlist?list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN

Try to implement those as the exercises come up. Much better you can take up his Machine Learning course in coursera. This will clear up your basics. (link below)

https://www.coursera.org/learn/machine-learning

Additionally, i have created a repository for starters like you during my learning process. feel free to check it out (link below)

https://github.com/williamscott701/Machine-Learning

Hope this helps ;)

Keep the fire on.

Thank you very much! In the end, if I know those things, what do I do before applying for a job? Do I do kaggle challenges and post my approaches in notebooks on Github? Or is there something else I should do? — Mar 23 at 20:36
Hey! there can never be an end to learning, and as i mentioned these are just to get you started. But if you are mainly focusing on the jobs side, then kaggle challenges are a good way to apply what you have learned. Also find some interview questions which can are fairly standard (you can get these by googling). Vote me if i was able to help ;) — Mar 23 at 20:48

搜尋此網誌

Trjtdtk

2 Answers
2

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers 2

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

1 - You need the basics, that is already much more than you expected it to be:

2 - Build Respect

3 - Get programming skills

4 - Go Deeper

5 - Finally: Career

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers
2

2 Answers
2

2 Answers
2