Generalization bound (single hypothesis) in “Foundations of Machine Learning” The 2019 Stack Overflow Developer Survey Results Are InNLTK: Tuning LinearSVC classifier accuracy? - Looking for better approaches/advicesWhich Machine Learning book to choose (APM, MLAP or ISL)?Machine Learning - Range of Hypothesis space and choiceof Hypothesis function typeMachine Learning - Choice of features for determining hypothesisGeneralization Error DefinitionWhat is PAC learning?How can we decompose generalization gap as done in the paper “Generalization in Deep Learning”?Is it possible to use NEAT networks for solving video games?Backpropagation - softmax derivativeTraining NLP with multiple text input features

What is the accessibility of a package's `Private` context variables?

Geography at the pixel level

How to display lines in a file like ls displays files in a directory?

Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past

Balance problems for leveling up mid-fight?

If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?

Is an up-to-date browser secure on an out-of-date OS?

How to support a colleague who finds meetings extremely tiring?

How to manage monthly salary

Multiply Two Integer Polynomials

Can there be female White Walkers?

What is the motivation for a law requiring 2 parties to consent for recording a conversation

With regards to an effect that triggers when a creature attacks, how does it entering the battlefield tapped and attacking apply?

Pokemon Turn Based battle (Python)

How to translate "being like"?

What does Linus Torvalds mean when he says that Git "never ever" tracks a file?

The difference between dialogue marks

Right tool to dig six foot holes?

Is bread bad for ducks?

How to deal with speedster characters?

How to check whether the reindex working or not in Magento?

Is Sun brighter than what we actually see?

Should I use my personal e-mail address, or my workplace one, when registering to external websites for work purposes?

For what reasons would an animal species NOT cross a *horizontal* land bridge?



Generalization bound (single hypothesis) in “Foundations of Machine Learning”



The 2019 Stack Overflow Developer Survey Results Are InNLTK: Tuning LinearSVC classifier accuracy? - Looking for better approaches/advicesWhich Machine Learning book to choose (APM, MLAP or ISL)?Machine Learning - Range of Hypothesis space and choiceof Hypothesis function typeMachine Learning - Choice of features for determining hypothesisGeneralization Error DefinitionWhat is PAC learning?How can we decompose generalization gap as done in the paper “Generalization in Deep Learning”?Is it possible to use NEAT networks for solving video games?Backpropagation - softmax derivativeTraining NLP with multiple text input features










2












$begingroup$


I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$.



Equation $2.17$ seems to only hold when $hatR_S(h)<R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important?



Thank you very much for reading this question.










share|improve this question











$endgroup$
















    2












    $begingroup$


    I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$.



    Equation $2.17$ seems to only hold when $hatR_S(h)<R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important?



    Thank you very much for reading this question.










    share|improve this question











    $endgroup$














      2












      2








      2





      $begingroup$


      I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$.



      Equation $2.17$ seems to only hold when $hatR_S(h)<R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important?



      Thank you very much for reading this question.










      share|improve this question











      $endgroup$




      I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$.



      Equation $2.17$ seems to only hold when $hatR_S(h)<R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important?



      Thank you very much for reading this question.







      machine-learning pac-learning






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 30 at 15:06









      Esmailian

      2,991320




      2,991320










      asked Mar 30 at 11:59









      ML studentML student

      554




      554




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48257%2fgeneralization-bound-single-hypothesis-in-foundations-of-machine-learning%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18















          1












          $begingroup$

          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18













          1












          1








          1





          $begingroup$

          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.






          share|improve this answer











          $endgroup$



          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 31 at 9:49

























          answered Mar 30 at 14:52









          EsmailianEsmailian

          2,991320




          2,991320







          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18












          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18







          1




          1




          $begingroup$
          Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
          $endgroup$
          – ML student
          Mar 31 at 23:18




          $begingroup$
          Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
          $endgroup$
          – ML student
          Mar 31 at 23:18

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48257%2fgeneralization-bound-single-hypothesis-in-foundations-of-machine-learning%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

          Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?