Deep Reinforcement Learning for dynamic pricingValue Updation Dynamic Programming Reinforcement learningSemi-gradient TD(0) Choosing an ActionQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsReinforcement Deep Learning for object detectionReinforcement Learning with static stateDefining State Representation in Deep Q-LearningHindsight Experience Replay: what the reward w.r.t. to sample goal meansPotential-based reward shaping in DQN reinforcement learningApplication of Deep Reinforcement Learning
Are ETF trackers fundamentally better than individual stocks?
Why do passenger jet manufacturers design their planes with stall prevention systems?
Have the tides ever turned twice on any open problem?
Are Roman Catholic priests ever addressed as pastor
Official degrees of earth’s rotation per day
Is there a place to find the pricing for things not mentioned in the PHB? (non-magical)
Most cost effective thermostat setting: consistent temperature vs. lowest temperature possible
How to terminate ping <dest> &
Life insurance that covers only simultaneous/dual deaths
How well should I expect Adam to work?
Why won't this compile? Argument of h has an extra {
ERC721: How to get the owned tokens of an address
How could a scammer know the apps on my phone / iTunes account?
Is "upgrade" the right word to use in this context?
New passport but visa is in old (lost) passport
Python if-else code style for reduced code for rounding floats
Bach's Toccata and Fugue in D minor breaks the "no parallel octaves" rule?
As a new Ubuntu desktop 18.04 LTS user, do I need to use ufw for a firewall or is iptables sufficient?
Is there a symmetric-key algorithm which we can use for creating a signature?
Meme-controlled people
Knife as defense against stray dogs
Is honey really a supersaturated solution? Does heating to un-crystalize redissolve it or melt it?
Do the common programs (for example: "ls", "cat") in Linux and BSD come from the same source code?
Min function accepting varying number of arguments in C++17
Deep Reinforcement Learning for dynamic pricing
Value Updation Dynamic Programming Reinforcement learningSemi-gradient TD(0) Choosing an ActionQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsReinforcement Deep Learning for object detectionReinforcement Learning with static stateDefining State Representation in Deep Q-LearningHindsight Experience Replay: what the reward w.r.t. to sample goal meansPotential-based reward shaping in DQN reinforcement learningApplication of Deep Reinforcement Learning
$begingroup$
I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define
State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.
Action Space (price itself, can range from 0 to inf) we need to determine the price itself.
Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.
I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.
I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?
-- UPDATE --
State space that evolves over the time is the remaining capacity (Logistics).
Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.
The agent will have to find an optimal price based on the following rewards.
deep-learning tensorflow reinforcement-learning dqn deepmind
New contributor
$endgroup$
|
show 10 more comments
$begingroup$
I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define
State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.
Action Space (price itself, can range from 0 to inf) we need to determine the price itself.
Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.
I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.
I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?
-- UPDATE --
State space that evolves over the time is the remaining capacity (Logistics).
Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.
The agent will have to find an optimal price based on the following rewards.
deep-learning tensorflow reinforcement-learning dqn deepmind
New contributor
$endgroup$
1
$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday
$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday
$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday
|
show 10 more comments
$begingroup$
I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define
State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.
Action Space (price itself, can range from 0 to inf) we need to determine the price itself.
Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.
I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.
I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?
-- UPDATE --
State space that evolves over the time is the remaining capacity (Logistics).
Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.
The agent will have to find an optimal price based on the following rewards.
deep-learning tensorflow reinforcement-learning dqn deepmind
New contributor
$endgroup$
I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define
State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.
Action Space (price itself, can range from 0 to inf) we need to determine the price itself.
Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.
I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.
I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?
-- UPDATE --
State space that evolves over the time is the remaining capacity (Logistics).
Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.
The agent will have to find an optimal price based on the following rewards.
deep-learning tensorflow reinforcement-learning dqn deepmind
deep-learning tensorflow reinforcement-learning dqn deepmind
New contributor
New contributor
edited yesterday
Karthik Rajkumar
New contributor
asked yesterday
Karthik RajkumarKarthik Rajkumar
113
113
New contributor
New contributor
1
$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday
$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday
$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday
|
show 10 more comments
1
$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday
$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday
$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday
1
1
$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday
$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday
$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday
$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday
$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday
$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday
|
show 10 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47335%2fdeep-reinforcement-learning-for-dynamic-pricing%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.
Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.
Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.
Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47335%2fdeep-reinforcement-learning-for-dynamic-pricing%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday
$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday
$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday
$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday