Deep Reinforcement Learning for dynamic pricingValue Updation Dynamic Programming Reinforcement learningSemi-gradient TD(0) Choosing an ActionQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsReinforcement Deep Learning for object detectionReinforcement Learning with static stateDefining State Representation in Deep Q-LearningHindsight Experience Replay: what the reward w.r.t. to sample goal meansPotential-based reward shaping in DQN reinforcement learningApplication of Deep Reinforcement Learning

Are ETF trackers fundamentally better than individual stocks?

Why do passenger jet manufacturers design their planes with stall prevention systems?

Have the tides ever turned twice on any open problem?

Are Roman Catholic priests ever addressed as pastor

Official degrees of earth’s rotation per day

Is there a place to find the pricing for things not mentioned in the PHB? (non-magical)

Most cost effective thermostat setting: consistent temperature vs. lowest temperature possible

How to terminate ping <dest> &

Life insurance that covers only simultaneous/dual deaths

How well should I expect Adam to work?

Why won't this compile? Argument of h has an extra {

ERC721: How to get the owned tokens of an address

How could a scammer know the apps on my phone / iTunes account?

Is "upgrade" the right word to use in this context?

New passport but visa is in old (lost) passport

Python if-else code style for reduced code for rounding floats

Bach's Toccata and Fugue in D minor breaks the "no parallel octaves" rule?

As a new Ubuntu desktop 18.04 LTS user, do I need to use ufw for a firewall or is iptables sufficient?

Is there a symmetric-key algorithm which we can use for creating a signature?

Meme-controlled people

Knife as defense against stray dogs

Is honey really a supersaturated solution? Does heating to un-crystalize redissolve it or melt it?

Do the common programs (for example: "ls", "cat") in Linux and BSD come from the same source code?

Min function accepting varying number of arguments in C++17



Deep Reinforcement Learning for dynamic pricing


Value Updation Dynamic Programming Reinforcement learningSemi-gradient TD(0) Choosing an ActionQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsReinforcement Deep Learning for object detectionReinforcement Learning with static stateDefining State Representation in Deep Q-LearningHindsight Experience Replay: what the reward w.r.t. to sample goal meansPotential-based reward shaping in DQN reinforcement learningApplication of Deep Reinforcement Learning













2












$begingroup$


I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define



  1. State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.


  2. Action Space (price itself, can range from 0 to inf) we need to determine the price itself.


  3. Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.


I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.



I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?



-- UPDATE --



State space that evolves over the time is the remaining capacity (Logistics).
Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.



The agent will have to find an optimal price based on the following rewards.










share|improve this question









New contributor




Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$







  • 1




    $begingroup$
    The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
    $endgroup$
    – Neil Slater
    yesterday











  • $begingroup$
    Hi, I have updated the question. Kindly take a look into it.
    $endgroup$
    – Karthik Rajkumar
    yesterday










  • $begingroup$
    Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
    $endgroup$
    – Neil Slater
    yesterday










  • $begingroup$
    For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
    $endgroup$
    – Karthik Rajkumar
    yesterday











  • $begingroup$
    As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
    $endgroup$
    – Neil Slater
    yesterday















2












$begingroup$


I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define



  1. State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.


  2. Action Space (price itself, can range from 0 to inf) we need to determine the price itself.


  3. Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.


I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.



I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?



-- UPDATE --



State space that evolves over the time is the remaining capacity (Logistics).
Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.



The agent will have to find an optimal price based on the following rewards.










share|improve this question









New contributor




Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$







  • 1




    $begingroup$
    The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
    $endgroup$
    – Neil Slater
    yesterday











  • $begingroup$
    Hi, I have updated the question. Kindly take a look into it.
    $endgroup$
    – Karthik Rajkumar
    yesterday










  • $begingroup$
    Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
    $endgroup$
    – Neil Slater
    yesterday










  • $begingroup$
    For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
    $endgroup$
    – Karthik Rajkumar
    yesterday











  • $begingroup$
    As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
    $endgroup$
    – Neil Slater
    yesterday













2












2








2





$begingroup$


I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define



  1. State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.


  2. Action Space (price itself, can range from 0 to inf) we need to determine the price itself.


  3. Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.


I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.



I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?



-- UPDATE --



State space that evolves over the time is the remaining capacity (Logistics).
Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.



The agent will have to find an optimal price based on the following rewards.










share|improve this question









New contributor




Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define



  1. State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.


  2. Action Space (price itself, can range from 0 to inf) we need to determine the price itself.


  3. Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.


I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.



I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?



-- UPDATE --



State space that evolves over the time is the remaining capacity (Logistics).
Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.



The agent will have to find an optimal price based on the following rewards.







deep-learning tensorflow reinforcement-learning dqn deepmind






share|improve this question









New contributor




Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday







Karthik Rajkumar













New contributor




Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









Karthik RajkumarKarthik Rajkumar

113




113




New contributor




Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 1




    $begingroup$
    The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
    $endgroup$
    – Neil Slater
    yesterday











  • $begingroup$
    Hi, I have updated the question. Kindly take a look into it.
    $endgroup$
    – Karthik Rajkumar
    yesterday










  • $begingroup$
    Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
    $endgroup$
    – Neil Slater
    yesterday










  • $begingroup$
    For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
    $endgroup$
    – Karthik Rajkumar
    yesterday











  • $begingroup$
    As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
    $endgroup$
    – Neil Slater
    yesterday












  • 1




    $begingroup$
    The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
    $endgroup$
    – Neil Slater
    yesterday











  • $begingroup$
    Hi, I have updated the question. Kindly take a look into it.
    $endgroup$
    – Karthik Rajkumar
    yesterday










  • $begingroup$
    Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
    $endgroup$
    – Neil Slater
    yesterday










  • $begingroup$
    For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
    $endgroup$
    – Karthik Rajkumar
    yesterday











  • $begingroup$
    As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
    $endgroup$
    – Neil Slater
    yesterday







1




1




$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday





$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday













$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday




$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday












$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday




$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday












$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday





$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday













$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday




$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday










0






active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47335%2fdeep-reinforcement-learning-for-dynamic-pricing%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes








Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.












Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.











Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.














Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47335%2fdeep-reinforcement-learning-for-dynamic-pricing%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High