How to check If Stochastic Gradient Descent produces the optimum MSE for linear regression2019 Community Moderator ElectionStochastic gradient descent in logistic regressionStochastic gradient descent based on vector operations?Stochastic gradient descent and different approachesWhat is the stochastic part in stochastic gradient descent?Stochastic Gradient Descent BatchingImplementation of Stochastic Gradient Descent in PythonTraining Examples used in Stochastic Gradient DescentProblem with Linear Regression and Gradient DescentLinear classifier and gradient descent
Is it tax fraud for an individual to declare non-taxable revenue as taxable income? (US tax laws)
How do I create uniquely male characters?
How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?
can i play a electric guitar through a bass amp?
What are these boxed doors outside store fronts in New York?
Fencing style for blades that can attack from a distance
Why are 150k or 200k jobs considered good when there are 300k+ births a month?
Collect Fourier series terms
What's the output of a record cartridge playing an out-of-speed record
Why do falling prices hurt debtors?
I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine
The use of multiple foreign keys on same column in SQL Server
How is it possible to have an ability score that is less than 3?
"You are your self first supporter", a more proper way to say it
How old can references or sources in a thesis be?
Adding span tags within wp_list_pages list items
Today is the Center
Smoothness of finite-dimensional functional calculus
Why not use SQL instead of GraphQL?
Why do I get two different answers for this counting problem?
Do VLANs within a subnet need to have their own subnet for router on a stick?
How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?
What do "features" mean/refer to in this sentence?
Show that if two triangles built on parallel lines, with equal bases have the same perimeter only if they are congruent.
How to check If Stochastic Gradient Descent produces the optimum MSE for linear regression
2019 Community Moderator ElectionStochastic gradient descent in logistic regressionStochastic gradient descent based on vector operations?Stochastic gradient descent and different approachesWhat is the stochastic part in stochastic gradient descent?Stochastic Gradient Descent BatchingImplementation of Stochastic Gradient Descent in PythonTraining Examples used in Stochastic Gradient DescentProblem with Linear Regression and Gradient DescentLinear classifier and gradient descent
$begingroup$
I am implementing SGD in linear regression .
By varying the learning rate and sample size different weight vectors are produced. These produce different MSE that are far apart. Is it possible to produce an MSE as close to the one produced by SGDRegressor by the code below.
def SGDProcess(self,niter, npts):
self.num_iter = niter
self.no_of_pts = npts
self.w_prev = self.w_0
#self.w_prev = [1,2,3,4,5,6,7,8,9,10,11,12,13]
self.intcpt_prev = self.intcept
for i in range(0,self.num_iter):
w_diff = []
self.generaterandomsample()
num_feat = self.w_0.shape[0]
num_rows = self.x_sgdt.shape[0]
self.w_next = np.zeros(num_feat)
self.partial_w = np.zeros((num_rows,num_feat))
yerror = np.zeros(num_rows)
pred = np.zeros(num_rows)
self.intcpt_next = 0.0
#print(num_feat,num_rows)
for j in range(0,num_rows):
for k in range(0,num_feat):
#self.w_next[j] += (-2 * self.x_sgdt[k,j])*(self.learning_rate*(self.y_sgdt[k]- self.w_prev[j]*self.x_sgdt[k,j] - self.intcpt_prev))
#self.intcpt_next += (-2 * (self.learning_rate*(self.y_sgdt[k]- self.w_prev[j]*self.x_sgdt[k,j] - self.intcpt_prev)))
#self.partial_w[j] += ((-2 * self.x_sgdt[j,k])*((self.y_sgdt[j]- (self.w_prev[k]*self.x_sgdt[j,k] - self.intcpt_prev))))
#self.partial_intcpt += (-2 * (self.y_sgdt[j]- (self.w_prev[k]*self.x_sgdt[j,k] - self.intcpt_prev)))
pred[j] += (self.w_prev[k]*self.x_sgdt[j,k] )
pred[j] += self.intcpt_prev
yerror[j]=self.y_sgdt[j] - pred[j]
for k in range(0,num_feat):
self.partial_w[j][k] = (-2 * self.x_sgdt[j,k])*yerror[j]
self.intcpt_next += (-2 * yerror[j])
#print(self.partial_w)
for col in range(0,num_feat):
for row in range(0,num_rows):
self.w_next[col] += (self.learning_rate / num_rows) * self.partial_w[row][col]
self.intcpt_next = (self.learning_rate / num_rows) * self.intcpt_next
self.w_next = self.w_prev - self.w_next
w_diff = (self.w_prev - self.w_next)
#print("pred",pred,"n error",yerror)
#print("nprev",i, self.w_prev,"n w_next",self.w_next,"n intcpt",self.intcpt_next,"n diff",w_diff)
if self.checkallval(w_diff):
print('SOLUTION CONVERGED')
self.w_opt = self.w_next
self.intcpt_opt = self.intcpt_next
break
else:
self.w_prev = self.w_next
self.intcpt_prev = self.intcpt_next
self.learning_rate = (self.learning_rate) /2
#for a in range(0,num_rows):
# print('act',self.y_sgdt[a],'pred',self.partial_w[a],'err',yerror[a])
#print(len(yerror))
self.w_opt = self.w_next
self.intcpt_opt = self.intcpt_next
return [self.w_next, self.intcpt_next,self.learning_rate]
#get random k points from the datset for SGD
def generaterandomsample(self):
self.x_sgdt = (self.x_sgdt_df.sample(self.no_of_pts)).values
self.y_sgdt = (self.y_sgdt_df.sample(self.no_of_pts)).values
#print(self.x_sgdt, self.y_sgdt)
scaler = preprocessing.StandardScaler().fit(self.x_sgdt)
self.x_sgdt = scaler.transform(self.x_sgdt)
def checkallval(self,wdiff):
j= 0
k= 0
for i in range(0,len(wdiff)):
if (self.w_prev[i] - self.w_next[i]) <= 0.0000001:
#print("diff less than 0.00001n")
j+=1
elif (self.w_prev[i] - self.w_next[i]) > 0.0000001:
#print("diff greater than 0.00001n")
j-=1
if j==len(wdiff):
return True
else:
return False
regards
jana
gradient-descent
$endgroup$
add a comment |
$begingroup$
I am implementing SGD in linear regression .
By varying the learning rate and sample size different weight vectors are produced. These produce different MSE that are far apart. Is it possible to produce an MSE as close to the one produced by SGDRegressor by the code below.
def SGDProcess(self,niter, npts):
self.num_iter = niter
self.no_of_pts = npts
self.w_prev = self.w_0
#self.w_prev = [1,2,3,4,5,6,7,8,9,10,11,12,13]
self.intcpt_prev = self.intcept
for i in range(0,self.num_iter):
w_diff = []
self.generaterandomsample()
num_feat = self.w_0.shape[0]
num_rows = self.x_sgdt.shape[0]
self.w_next = np.zeros(num_feat)
self.partial_w = np.zeros((num_rows,num_feat))
yerror = np.zeros(num_rows)
pred = np.zeros(num_rows)
self.intcpt_next = 0.0
#print(num_feat,num_rows)
for j in range(0,num_rows):
for k in range(0,num_feat):
#self.w_next[j] += (-2 * self.x_sgdt[k,j])*(self.learning_rate*(self.y_sgdt[k]- self.w_prev[j]*self.x_sgdt[k,j] - self.intcpt_prev))
#self.intcpt_next += (-2 * (self.learning_rate*(self.y_sgdt[k]- self.w_prev[j]*self.x_sgdt[k,j] - self.intcpt_prev)))
#self.partial_w[j] += ((-2 * self.x_sgdt[j,k])*((self.y_sgdt[j]- (self.w_prev[k]*self.x_sgdt[j,k] - self.intcpt_prev))))
#self.partial_intcpt += (-2 * (self.y_sgdt[j]- (self.w_prev[k]*self.x_sgdt[j,k] - self.intcpt_prev)))
pred[j] += (self.w_prev[k]*self.x_sgdt[j,k] )
pred[j] += self.intcpt_prev
yerror[j]=self.y_sgdt[j] - pred[j]
for k in range(0,num_feat):
self.partial_w[j][k] = (-2 * self.x_sgdt[j,k])*yerror[j]
self.intcpt_next += (-2 * yerror[j])
#print(self.partial_w)
for col in range(0,num_feat):
for row in range(0,num_rows):
self.w_next[col] += (self.learning_rate / num_rows) * self.partial_w[row][col]
self.intcpt_next = (self.learning_rate / num_rows) * self.intcpt_next
self.w_next = self.w_prev - self.w_next
w_diff = (self.w_prev - self.w_next)
#print("pred",pred,"n error",yerror)
#print("nprev",i, self.w_prev,"n w_next",self.w_next,"n intcpt",self.intcpt_next,"n diff",w_diff)
if self.checkallval(w_diff):
print('SOLUTION CONVERGED')
self.w_opt = self.w_next
self.intcpt_opt = self.intcpt_next
break
else:
self.w_prev = self.w_next
self.intcpt_prev = self.intcpt_next
self.learning_rate = (self.learning_rate) /2
#for a in range(0,num_rows):
# print('act',self.y_sgdt[a],'pred',self.partial_w[a],'err',yerror[a])
#print(len(yerror))
self.w_opt = self.w_next
self.intcpt_opt = self.intcpt_next
return [self.w_next, self.intcpt_next,self.learning_rate]
#get random k points from the datset for SGD
def generaterandomsample(self):
self.x_sgdt = (self.x_sgdt_df.sample(self.no_of_pts)).values
self.y_sgdt = (self.y_sgdt_df.sample(self.no_of_pts)).values
#print(self.x_sgdt, self.y_sgdt)
scaler = preprocessing.StandardScaler().fit(self.x_sgdt)
self.x_sgdt = scaler.transform(self.x_sgdt)
def checkallval(self,wdiff):
j= 0
k= 0
for i in range(0,len(wdiff)):
if (self.w_prev[i] - self.w_next[i]) <= 0.0000001:
#print("diff less than 0.00001n")
j+=1
elif (self.w_prev[i] - self.w_next[i]) > 0.0000001:
#print("diff greater than 0.00001n")
j-=1
if j==len(wdiff):
return True
else:
return False
regards
jana
gradient-descent
$endgroup$
add a comment |
$begingroup$
I am implementing SGD in linear regression .
By varying the learning rate and sample size different weight vectors are produced. These produce different MSE that are far apart. Is it possible to produce an MSE as close to the one produced by SGDRegressor by the code below.
def SGDProcess(self,niter, npts):
self.num_iter = niter
self.no_of_pts = npts
self.w_prev = self.w_0
#self.w_prev = [1,2,3,4,5,6,7,8,9,10,11,12,13]
self.intcpt_prev = self.intcept
for i in range(0,self.num_iter):
w_diff = []
self.generaterandomsample()
num_feat = self.w_0.shape[0]
num_rows = self.x_sgdt.shape[0]
self.w_next = np.zeros(num_feat)
self.partial_w = np.zeros((num_rows,num_feat))
yerror = np.zeros(num_rows)
pred = np.zeros(num_rows)
self.intcpt_next = 0.0
#print(num_feat,num_rows)
for j in range(0,num_rows):
for k in range(0,num_feat):
#self.w_next[j] += (-2 * self.x_sgdt[k,j])*(self.learning_rate*(self.y_sgdt[k]- self.w_prev[j]*self.x_sgdt[k,j] - self.intcpt_prev))
#self.intcpt_next += (-2 * (self.learning_rate*(self.y_sgdt[k]- self.w_prev[j]*self.x_sgdt[k,j] - self.intcpt_prev)))
#self.partial_w[j] += ((-2 * self.x_sgdt[j,k])*((self.y_sgdt[j]- (self.w_prev[k]*self.x_sgdt[j,k] - self.intcpt_prev))))
#self.partial_intcpt += (-2 * (self.y_sgdt[j]- (self.w_prev[k]*self.x_sgdt[j,k] - self.intcpt_prev)))
pred[j] += (self.w_prev[k]*self.x_sgdt[j,k] )
pred[j] += self.intcpt_prev
yerror[j]=self.y_sgdt[j] - pred[j]
for k in range(0,num_feat):
self.partial_w[j][k] = (-2 * self.x_sgdt[j,k])*yerror[j]
self.intcpt_next += (-2 * yerror[j])
#print(self.partial_w)
for col in range(0,num_feat):
for row in range(0,num_rows):
self.w_next[col] += (self.learning_rate / num_rows) * self.partial_w[row][col]
self.intcpt_next = (self.learning_rate / num_rows) * self.intcpt_next
self.w_next = self.w_prev - self.w_next
w_diff = (self.w_prev - self.w_next)
#print("pred",pred,"n error",yerror)
#print("nprev",i, self.w_prev,"n w_next",self.w_next,"n intcpt",self.intcpt_next,"n diff",w_diff)
if self.checkallval(w_diff):
print('SOLUTION CONVERGED')
self.w_opt = self.w_next
self.intcpt_opt = self.intcpt_next
break
else:
self.w_prev = self.w_next
self.intcpt_prev = self.intcpt_next
self.learning_rate = (self.learning_rate) /2
#for a in range(0,num_rows):
# print('act',self.y_sgdt[a],'pred',self.partial_w[a],'err',yerror[a])
#print(len(yerror))
self.w_opt = self.w_next
self.intcpt_opt = self.intcpt_next
return [self.w_next, self.intcpt_next,self.learning_rate]
#get random k points from the datset for SGD
def generaterandomsample(self):
self.x_sgdt = (self.x_sgdt_df.sample(self.no_of_pts)).values
self.y_sgdt = (self.y_sgdt_df.sample(self.no_of_pts)).values
#print(self.x_sgdt, self.y_sgdt)
scaler = preprocessing.StandardScaler().fit(self.x_sgdt)
self.x_sgdt = scaler.transform(self.x_sgdt)
def checkallval(self,wdiff):
j= 0
k= 0
for i in range(0,len(wdiff)):
if (self.w_prev[i] - self.w_next[i]) <= 0.0000001:
#print("diff less than 0.00001n")
j+=1
elif (self.w_prev[i] - self.w_next[i]) > 0.0000001:
#print("diff greater than 0.00001n")
j-=1
if j==len(wdiff):
return True
else:
return False
regards
jana
gradient-descent
$endgroup$
I am implementing SGD in linear regression .
By varying the learning rate and sample size different weight vectors are produced. These produce different MSE that are far apart. Is it possible to produce an MSE as close to the one produced by SGDRegressor by the code below.
def SGDProcess(self,niter, npts):
self.num_iter = niter
self.no_of_pts = npts
self.w_prev = self.w_0
#self.w_prev = [1,2,3,4,5,6,7,8,9,10,11,12,13]
self.intcpt_prev = self.intcept
for i in range(0,self.num_iter):
w_diff = []
self.generaterandomsample()
num_feat = self.w_0.shape[0]
num_rows = self.x_sgdt.shape[0]
self.w_next = np.zeros(num_feat)
self.partial_w = np.zeros((num_rows,num_feat))
yerror = np.zeros(num_rows)
pred = np.zeros(num_rows)
self.intcpt_next = 0.0
#print(num_feat,num_rows)
for j in range(0,num_rows):
for k in range(0,num_feat):
#self.w_next[j] += (-2 * self.x_sgdt[k,j])*(self.learning_rate*(self.y_sgdt[k]- self.w_prev[j]*self.x_sgdt[k,j] - self.intcpt_prev))
#self.intcpt_next += (-2 * (self.learning_rate*(self.y_sgdt[k]- self.w_prev[j]*self.x_sgdt[k,j] - self.intcpt_prev)))
#self.partial_w[j] += ((-2 * self.x_sgdt[j,k])*((self.y_sgdt[j]- (self.w_prev[k]*self.x_sgdt[j,k] - self.intcpt_prev))))
#self.partial_intcpt += (-2 * (self.y_sgdt[j]- (self.w_prev[k]*self.x_sgdt[j,k] - self.intcpt_prev)))
pred[j] += (self.w_prev[k]*self.x_sgdt[j,k] )
pred[j] += self.intcpt_prev
yerror[j]=self.y_sgdt[j] - pred[j]
for k in range(0,num_feat):
self.partial_w[j][k] = (-2 * self.x_sgdt[j,k])*yerror[j]
self.intcpt_next += (-2 * yerror[j])
#print(self.partial_w)
for col in range(0,num_feat):
for row in range(0,num_rows):
self.w_next[col] += (self.learning_rate / num_rows) * self.partial_w[row][col]
self.intcpt_next = (self.learning_rate / num_rows) * self.intcpt_next
self.w_next = self.w_prev - self.w_next
w_diff = (self.w_prev - self.w_next)
#print("pred",pred,"n error",yerror)
#print("nprev",i, self.w_prev,"n w_next",self.w_next,"n intcpt",self.intcpt_next,"n diff",w_diff)
if self.checkallval(w_diff):
print('SOLUTION CONVERGED')
self.w_opt = self.w_next
self.intcpt_opt = self.intcpt_next
break
else:
self.w_prev = self.w_next
self.intcpt_prev = self.intcpt_next
self.learning_rate = (self.learning_rate) /2
#for a in range(0,num_rows):
# print('act',self.y_sgdt[a],'pred',self.partial_w[a],'err',yerror[a])
#print(len(yerror))
self.w_opt = self.w_next
self.intcpt_opt = self.intcpt_next
return [self.w_next, self.intcpt_next,self.learning_rate]
#get random k points from the datset for SGD
def generaterandomsample(self):
self.x_sgdt = (self.x_sgdt_df.sample(self.no_of_pts)).values
self.y_sgdt = (self.y_sgdt_df.sample(self.no_of_pts)).values
#print(self.x_sgdt, self.y_sgdt)
scaler = preprocessing.StandardScaler().fit(self.x_sgdt)
self.x_sgdt = scaler.transform(self.x_sgdt)
def checkallval(self,wdiff):
j= 0
k= 0
for i in range(0,len(wdiff)):
if (self.w_prev[i] - self.w_next[i]) <= 0.0000001:
#print("diff less than 0.00001n")
j+=1
elif (self.w_prev[i] - self.w_next[i]) > 0.0000001:
#print("diff greater than 0.00001n")
j-=1
if j==len(wdiff):
return True
else:
return False
regards
jana
gradient-descent
gradient-descent
asked Mar 27 at 14:23
megjoshmegjosh
11
11
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48089%2fhow-to-check-if-stochastic-gradient-descent-produces-the-optimum-mse-for-linear%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48089%2fhow-to-check-if-stochastic-gradient-descent-produces-the-optimum-mse-for-linear%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown