Deep Reinforcement Learning for dynamic pricingValue Updation Dynamic Programming Reinforcement learningSemi-gradient TD(0) Choosing an ActionQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsReinforcement Deep Learning for object detectionReinforcement Learning with static stateDefining State Representation in Deep Q-LearningHindsight Experience Replay: what the reward w.r.t. to sample goal meansPotential-based reward shaping in DQN reinforcement learningApplication of Deep Reinforcement Learning

Are ETF trackers fundamentally better than individual stocks?

Why do passenger jet manufacturers design their planes with stall prevention systems?

Have the tides ever turned twice on any open problem?

Are Roman Catholic priests ever addressed as pastor

Official degrees of earth’s rotation per day

Is there a place to find the pricing for things not mentioned in the PHB? (non-magical)

Most cost effective thermostat setting: consistent temperature vs. lowest temperature possible

How to terminate ping <dest> &

Life insurance that covers only simultaneous/dual deaths

How well should I expect Adam to work?

Why won't this compile? Argument of h has an extra {

ERC721: How to get the owned tokens of an address

How could a scammer know the apps on my phone / iTunes account?

Is "upgrade" the right word to use in this context?

New passport but visa is in old (lost) passport

Python if-else code style for reduced code for rounding floats

Bach's Toccata and Fugue in D minor breaks the "no parallel octaves" rule?

As a new Ubuntu desktop 18.04 LTS user, do I need to use ufw for a firewall or is iptables sufficient?

Is there a symmetric-key algorithm which we can use for creating a signature?

Meme-controlled people

Knife as defense against stray dogs

Is honey really a supersaturated solution? Does heating to un-crystalize redissolve it or melt it?

Do the common programs (for example: "ls", "cat") in Linux and BSD come from the same source code?

Min function accepting varying number of arguments in C++17

Deep Reinforcement Learning for dynamic pricing

Value Updation Dynamic Programming Reinforcement learningSemi-gradient TD(0) Choosing an ActionQ-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columnsReinforcement Deep Learning for object detectionReinforcement Learning with static stateDefining State Representation in Deep Q-LearningHindsight Experience Replay: what the reward w.r.t. to sample goal meansPotential-based reward shaping in DQN reinforcement learningApplication of Deep Reinforcement Learning

I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define

State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.

Action Space (price itself, can range from 0 to inf) we need to determine the price itself.

Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.

I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.

I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?

-- UPDATE --

State space that evolves over the time is the remaining capacity (Logistics).
Consider at the initial time step is 10,000 kgs capacity and at over a period of time the capacity decreases and when the capacity is full and it cannot take anymore shipments, then the episode completes.

The agent will have to find an optimal price based on the following rewards.

edited yesterday

asked yesterday

Karthik Rajkumar

113

New contributor

1

$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday

$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday

$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday

$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday

$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday

|
show 10 more comments

I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define

State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.

Action Space (price itself, can range from 0 to inf) we need to determine the price itself.

Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.

I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.

I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?

-- UPDATE --

The agent will have to find an optimal price based on the following rewards.

edited yesterday

asked yesterday

Karthik Rajkumar

113

New contributor

1

$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday

$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday

$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday

$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday

$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday

|
show 10 more comments

I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define

State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.

Action Space (price itself, can range from 0 to inf) we need to determine the price itself.

Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.

I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.

I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?

-- UPDATE --

The agent will have to find an optimal price based on the following rewards.

edited yesterday

asked yesterday

Karthik Rajkumar

113

New contributor

I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define

State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc.

Action Space (price itself, can range from 0 to inf) we need to determine the price itself.

Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity.

I am planning to use Multi-Layer Perceptron for getting inputs from the state space and the outputting the price.

I am not sure how to define a reward function. Please help me in defining the mathematical formula for the reward function based on the price as an action space?

-- UPDATE --

The agent will have to find an optimal price based on the following rewards.

deep-learning tensorflow reinforcement-learning dqn deepmind

edited yesterday

asked yesterday

Karthik Rajkumar

113

New contributor

edited yesterday

asked yesterday

Karthik Rajkumar

113

New contributor

edited yesterday

asked yesterday

Karthik Rajkumar

113

New contributor

asked yesterday

Karthik Rajkumar

113

asked yesterday

Karthik Rajkumar

113

New contributor

Karthik Rajkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday

$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday

$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday

$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday

$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday

|
show 10 more comments

1

$begingroup$
The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.
$endgroup$
– Neil Slater
yesterday

$begingroup$
Hi, I have updated the question. Kindly take a look into it.
$endgroup$
– Karthik Rajkumar
yesterday

$begingroup$
Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?
$endgroup$
– Neil Slater
yesterday

$begingroup$
For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.
$endgroup$
– Karthik Rajkumar
yesterday

$begingroup$
As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc
$endgroup$
– Neil Slater
yesterday

The way to define a reward is to start with your goals and how you measure success of the agent. Could you add those? Also, you don't seem to have a state space that needs reinforcement learning. It looks more like a contextual bandit problem. Could you please identify any state variables that evolve over time, and what the time steps are? If each time step is a new, unrelated customer etc, then this is not really RL, although repeats of same customer might be handled as RL.

– Neil Slater
yesterday

Hi, I have updated the question. Kindly take a look into it.

– Karthik Rajkumar
yesterday

Thanks, that explains well how this maps to RL. However, I am still not sure what the goals are. Will it simply be total price sold at, or profit? Profit seems more likely the true goal, presumably you need to account for the current mix of destinations and route plan if this is a single container which must tour all the destinations in its itinery?

– Neil Slater
yesterday

For example, Similar price offered for the same origin and destination and the type of the shipment is 2.5 $ per kilo, Then based on the similar offer we can increase or decrease so the customer will accept the offer we provide. Lets take Seasonality. Any festival time we can increase the price as there will be more demand Or if capacity decreases and only few kilos left for accommodation we can increase the price.

– Karthik Rajkumar
yesterday

As well as capacity filling being end of episode, is this time limited? If you have an infinite number of customers lined up, then you can just set very high price and wait to make a huge profit. But reality is not like that, and once you accept your first customer, you will have limited opportunities to fill the rest of the capacity or be in breach of contract etc

– Neil Slater
yesterday

|
show 10 more comments

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47335%2fdeep-reinforcement-learning-for-dynamic-pricing%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Karthik Rajkumar is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog