Generalization bound (single hypothesis) in “Foundations of Machine Learning” The 2019 Stack Overflow Developer Survey Results Are InNLTK: Tuning LinearSVC classifier accuracy? - Looking for better approaches/advicesWhich Machine Learning book to choose (APM, MLAP or ISL)?Machine Learning - Range of Hypothesis space and choiceof Hypothesis function typeMachine Learning - Choice of features for determining hypothesisGeneralization Error DefinitionWhat is PAC learning?How can we decompose generalization gap as done in the paper “Generalization in Deep Learning”?Is it possible to use NEAT networks for solving video games?Backpropagation - softmax derivativeTraining NLP with multiple text input features

What is the accessibility of a package's `Private` context variables?

Geography at the pixel level

How to display lines in a file like ls displays files in a directory?

Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past

Balance problems for leveling up mid-fight?

If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?

Is an up-to-date browser secure on an out-of-date OS?

How to support a colleague who finds meetings extremely tiring?

How to manage monthly salary

Multiply Two Integer Polynomials

Can there be female White Walkers?

What is the motivation for a law requiring 2 parties to consent for recording a conversation

With regards to an effect that triggers when a creature attacks, how does it entering the battlefield tapped and attacking apply?

Pokemon Turn Based battle (Python)

How to translate "being like"?

What does Linus Torvalds mean when he says that Git "never ever" tracks a file?

The difference between dialogue marks

Right tool to dig six foot holes?

Is bread bad for ducks?

How to deal with speedster characters?

How to check whether the reindex working or not in Magento?

Is Sun brighter than what we actually see?

Should I use my personal e-mail address, or my workplace one, when registering to external websites for work purposes?

For what reasons would an animal species NOT cross a *horizontal* land bridge?



Generalization bound (single hypothesis) in “Foundations of Machine Learning”



The 2019 Stack Overflow Developer Survey Results Are InNLTK: Tuning LinearSVC classifier accuracy? - Looking for better approaches/advicesWhich Machine Learning book to choose (APM, MLAP or ISL)?Machine Learning - Range of Hypothesis space and choiceof Hypothesis function typeMachine Learning - Choice of features for determining hypothesisGeneralization Error DefinitionWhat is PAC learning?How can we decompose generalization gap as done in the paper “Generalization in Deep Learning”?Is it possible to use NEAT networks for solving video games?Backpropagation - softmax derivativeTraining NLP with multiple text input features










2












$begingroup$


I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$.



Equation $2.17$ seems to only hold when $hatR_S(h)<R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important?



Thank you very much for reading this question.










share|improve this question











$endgroup$
















    2












    $begingroup$


    I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$.



    Equation $2.17$ seems to only hold when $hatR_S(h)<R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important?



    Thank you very much for reading this question.










    share|improve this question











    $endgroup$














      2












      2








      2





      $begingroup$


      I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$.



      Equation $2.17$ seems to only hold when $hatR_S(h)<R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important?



      Thank you very much for reading this question.










      share|improve this question











      $endgroup$




      I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$.



      Equation $2.17$ seems to only hold when $hatR_S(h)<R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important?



      Thank you very much for reading this question.







      machine-learning pac-learning






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 30 at 15:06









      Esmailian

      2,991320




      2,991320










      asked Mar 30 at 11:59









      ML studentML student

      554




      554




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48257%2fgeneralization-bound-single-hypothesis-in-foundations-of-machine-learning%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18















          1












          $begingroup$

          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18













          1












          1








          1





          $begingroup$

          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.






          share|improve this answer











          $endgroup$



          You are right. The relaxed inequality
          $$R(h) le hatR_S(h)+ epsilon.$$
          can be replaced with the complete inequality
          $$left |hatR_S(h) - R(h) right| le epsilon.$$
          Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.



          We could say that the relaxed inequality is written for the sake of readability and/or convention.



          On the relation of inequalities



          Let us denote:
          $$A:=hatR_S(h) - R(h) le epsilon$$
          $$B:=hatR_S(h) - R(h) ge -epsilon$$
          thus,
          $$left| hatR_S(h) - R(h) right| le epsilon = A text and B$$
          Equation $(2.16)$ states:
          $$beginalign*
          & Bbb P(left| hatR_S(h) - R(h) right| ge epsilon) le delta \
          & Rightarrow Bbb P(left| hatR_S(h) - R(h) right| le epsilon) ge 1 - delta \
          & Rightarrow Bbb P(A text and B) ge 1 - delta \
          endalign*$$

          knowing that $Bbb P(B) ge Bbb P(A text and B)$,
          $$beginalign*
          & Bbb P(B) ge Bbb P(A text and B) ge 1 - delta \
          & Rightarrow Bbb P(hatR_S(h) - R(h) ge -epsilon) ge 1 - delta
          endalign*$$

          which is equivalent to
          $$R(h) le hatR_S(h) + epsilon$$
          with probability at least $1-delta$, i.e. equation $(2.17)$.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 31 at 9:49

























          answered Mar 30 at 14:52









          EsmailianEsmailian

          2,991320




          2,991320







          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18












          • 1




            $begingroup$
            Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
            $endgroup$
            – ML student
            Mar 31 at 23:18







          1




          1




          $begingroup$
          Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
          $endgroup$
          – ML student
          Mar 31 at 23:18




          $begingroup$
          Thank you very much!! The derivation for the relationships are very helpful, I understand this more now.
          $endgroup$
          – ML student
          Mar 31 at 23:18

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48257%2fgeneralization-bound-single-hypothesis-in-foundations-of-machine-learning%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Marja Vauras Lähteet | Aiheesta muualla | NavigointivalikkoMarja Vauras Turun yliopiston tutkimusportaalissaInfobox OKSuomalaisen Tiedeakatemian varsinaiset jäsenetKasvatustieteiden tiedekunnan dekaanit ja muu johtoMarja VaurasKoulutusvienti on kestävyys- ja ketteryyslaji (2.5.2017)laajentamallaWorldCat Identities0000 0001 0855 9405n86069603utb201588738523620927

          Which is better: GPT or RelGAN for text generation?2019 Community Moderator ElectionWhat is the difference between TextGAN and LM for text generation?GANs (generative adversarial networks) possible for text as well?Generator loss not decreasing- text to image synthesisChoosing a right algorithm for template-based text generationHow should I format input and output for text generation with LSTMsGumbel Softmax vs Vanilla Softmax for GAN trainingWhich neural network to choose for classification from text/speech?NLP text autoencoder that generates text in poetic meterWhat is the interpretation of the expectation notation in the GAN formulation?What is the difference between TextGAN and LM for text generation?How to prepare the data for text generation task

          Is this part of the description of the Archfey warlock's Misty Escape feature redundant?When is entropic ward considered “used”?How does the reaction timing work for Wrath of the Storm? Can it potentially prevent the damage from the triggering attack?Does the Dark Arts Archlich warlock patrons's Arcane Invisibility activate every time you cast a level 1+ spell?When attacking while invisible, when exactly does invisibility break?Can I cast Hellish Rebuke on my turn?Do I have to “pre-cast” a reaction spell in order for it to be triggered?What happens if a Player Misty Escapes into an Invisible CreatureCan a reaction interrupt multiattack?Does the Fiend-patron warlock's Hurl Through Hell feature dispel effects that require the target to be on the same plane as the caster?What are you allowed to do while using the Warlock's Eldritch Master feature?