Airline Fares - What analysis should be used to detect competitive price-setting behavior and price correlations? The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhat initial steps should I use to make sense of large data sets, and what tools should I use?Obtain a model where each feature vector is past few samples and labels are future few samples?What is the difference between Slow Feature Analysis (SFA) and a Moving Average?Predicting future airfare using past dataPredict task durationWhat data treatment/transformation should be applied if there are a lot of outliers and features lack normal distribution?What are some data sources with strong positive and negative correlations in feature and sample space?What Unsupervised Machine Learning algorithms can be used to detect Fraud in insurance?Applying predictive maintenance to predict labor, what kind of test data should be used?
Would an alien lifeform be able to achieve space travel if lacking in vision?
Did the new image of black hole confirm the general theory of relativity?
Do warforged have souls?
Keeping a retro style to sci-fi spaceships?
Is this wall load bearing? Blueprints and photos attached
How to pronounce 1ターン?
How to copy the contents of all files with a certain name into a new file?
What aspect of planet Earth must be changed to prevent the industrial revolution?
Simulating Exploding Dice
Why does the Event Horizon Telescope (EHT) not include telescopes from Africa, Asia or Australia?
How did the audience guess the pentatonic scale in Bobby McFerrin's presentation?
When did F become S in typeography, and why?
Difference between "generating set" and free product?
Did God make two great lights or did He make the great light two?
Arduino Pro Micro - switch off LEDs
Do working physicists consider Newtonian mechanics to be "falsified"?
how can a perfect fourth interval be considered either consonant or dissonant?
Would it be possible to rearrange a dragon's flight muscle to somewhat circumvent the square-cube law?
Relations between two reciprocal partial derivatives?
Why can't devices on different VLANs, but on the same subnet, communicate?
What was the last x86 CPU that did not have the x87 floating-point unit built in?
Road tyres vs "Street" tyres for charity ride on MTB Tandem
Grover's algorithm - DES circuit as oracle?
Sort list of array linked objects by keys and values
Airline Fares - What analysis should be used to detect competitive price-setting behavior and price correlations?
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhat initial steps should I use to make sense of large data sets, and what tools should I use?Obtain a model where each feature vector is past few samples and labels are future few samples?What is the difference between Slow Feature Analysis (SFA) and a Moving Average?Predicting future airfare using past dataPredict task durationWhat data treatment/transformation should be applied if there are a lot of outliers and features lack normal distribution?What are some data sources with strong positive and negative correlations in feature and sample space?What Unsupervised Machine Learning algorithms can be used to detect Fraud in insurance?Applying predictive maintenance to predict labor, what kind of test data should be used?
$begingroup$
I want to investigate price-setting behavior of airlines -- specifically how airlines react to competitors pricing.
As I would say my knowledge about more complex analysis is quite limited I've done mostly all basic methods to gather a overall view of the data. This includes simple graphs which already help to identify similar patterns. I am also using SAS Enterprise 9.4.
However I am looking for a more number based approach.
Data Set
The (self) collected data set I am using contain around ~54.000 fares.
All fares were collected within a 60 day time window, on a daily basis (every night at 00:00).
Hence, every fare within that time window occurs $n$ times subject to the availability of the fare as well as the departure date of the flight, when it is passed by the collection date of the fare.
(You can't collect a fare for a flight when the departure date of the flight is in the past)
The unformatted that looks basically like this: (fake data)
+--------------------+-----------+--------------------+--------------------------+---------------+
| requestDate | price| tripStartDeparture | tripDestinationDeparture | flightCarrier |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 725.32 | 16APR2015:10:50:02 | 23APR2015:21:55:04 | XA |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 966.32 | 16APR2015:13:20:02 | 23APR2015:19:00:04 | XY |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 915.32 | 16APR2015:13:20:02 | 23APR2015:21:55:04 | XH |
+--------------------+-----------+--------------------+--------------------------+---------------+
"DaysBeforeDeparture" is calculated via $I=s-c$ where
- I & interval (days before departure)
- s & date of the fare (flight departure)
- c & date of which the fare was collected
Here is a example of grouped data set by I (DaysBeforeDep.) (fake data!):
+-----------------+------------------+------------------+------------------+------------------+
| DaysBefDeparture | AVG_of_sale | MIN_of_sale | MAX_of_sale | operatingCarrier |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 880.68 | 477.99 | 2,245.23 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 904.89 | 477.99 | 2,534.55 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 1,044.39 | 920.99 | 2,119.09 | LH |
+-----------------+------------------+------------------+------------------+------------------+
What I came up with so far
Looking at the line graphs I can already estimate that several lines will have a high correlation factor. Hence, I tried to use correlation analysis first on the grouped data. But is that the correct way? Basically I try now to make correlations on the averages rather then on the individual prices?
Is there an other way?
I am unsure which regression model fits here, as the prices do not move in any linear form and appear non-linear. Would I need to fit a model to each of price developments of an airline
PS: This is a long text-wall. If I need to clarify anything let me know. I am new to this sub.
Anyone a clue? :-)
data-mining dataset regression correlation visualization
$endgroup$
add a comment |
$begingroup$
I want to investigate price-setting behavior of airlines -- specifically how airlines react to competitors pricing.
As I would say my knowledge about more complex analysis is quite limited I've done mostly all basic methods to gather a overall view of the data. This includes simple graphs which already help to identify similar patterns. I am also using SAS Enterprise 9.4.
However I am looking for a more number based approach.
Data Set
The (self) collected data set I am using contain around ~54.000 fares.
All fares were collected within a 60 day time window, on a daily basis (every night at 00:00).
Hence, every fare within that time window occurs $n$ times subject to the availability of the fare as well as the departure date of the flight, when it is passed by the collection date of the fare.
(You can't collect a fare for a flight when the departure date of the flight is in the past)
The unformatted that looks basically like this: (fake data)
+--------------------+-----------+--------------------+--------------------------+---------------+
| requestDate | price| tripStartDeparture | tripDestinationDeparture | flightCarrier |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 725.32 | 16APR2015:10:50:02 | 23APR2015:21:55:04 | XA |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 966.32 | 16APR2015:13:20:02 | 23APR2015:19:00:04 | XY |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 915.32 | 16APR2015:13:20:02 | 23APR2015:21:55:04 | XH |
+--------------------+-----------+--------------------+--------------------------+---------------+
"DaysBeforeDeparture" is calculated via $I=s-c$ where
- I & interval (days before departure)
- s & date of the fare (flight departure)
- c & date of which the fare was collected
Here is a example of grouped data set by I (DaysBeforeDep.) (fake data!):
+-----------------+------------------+------------------+------------------+------------------+
| DaysBefDeparture | AVG_of_sale | MIN_of_sale | MAX_of_sale | operatingCarrier |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 880.68 | 477.99 | 2,245.23 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 904.89 | 477.99 | 2,534.55 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 1,044.39 | 920.99 | 2,119.09 | LH |
+-----------------+------------------+------------------+------------------+------------------+
What I came up with so far
Looking at the line graphs I can already estimate that several lines will have a high correlation factor. Hence, I tried to use correlation analysis first on the grouped data. But is that the correct way? Basically I try now to make correlations on the averages rather then on the individual prices?
Is there an other way?
I am unsure which regression model fits here, as the prices do not move in any linear form and appear non-linear. Would I need to fit a model to each of price developments of an airline
PS: This is a long text-wall. If I need to clarify anything let me know. I am new to this sub.
Anyone a clue? :-)
data-mining dataset regression correlation visualization
$endgroup$
add a comment |
$begingroup$
I want to investigate price-setting behavior of airlines -- specifically how airlines react to competitors pricing.
As I would say my knowledge about more complex analysis is quite limited I've done mostly all basic methods to gather a overall view of the data. This includes simple graphs which already help to identify similar patterns. I am also using SAS Enterprise 9.4.
However I am looking for a more number based approach.
Data Set
The (self) collected data set I am using contain around ~54.000 fares.
All fares were collected within a 60 day time window, on a daily basis (every night at 00:00).
Hence, every fare within that time window occurs $n$ times subject to the availability of the fare as well as the departure date of the flight, when it is passed by the collection date of the fare.
(You can't collect a fare for a flight when the departure date of the flight is in the past)
The unformatted that looks basically like this: (fake data)
+--------------------+-----------+--------------------+--------------------------+---------------+
| requestDate | price| tripStartDeparture | tripDestinationDeparture | flightCarrier |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 725.32 | 16APR2015:10:50:02 | 23APR2015:21:55:04 | XA |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 966.32 | 16APR2015:13:20:02 | 23APR2015:19:00:04 | XY |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 915.32 | 16APR2015:13:20:02 | 23APR2015:21:55:04 | XH |
+--------------------+-----------+--------------------+--------------------------+---------------+
"DaysBeforeDeparture" is calculated via $I=s-c$ where
- I & interval (days before departure)
- s & date of the fare (flight departure)
- c & date of which the fare was collected
Here is a example of grouped data set by I (DaysBeforeDep.) (fake data!):
+-----------------+------------------+------------------+------------------+------------------+
| DaysBefDeparture | AVG_of_sale | MIN_of_sale | MAX_of_sale | operatingCarrier |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 880.68 | 477.99 | 2,245.23 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 904.89 | 477.99 | 2,534.55 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 1,044.39 | 920.99 | 2,119.09 | LH |
+-----------------+------------------+------------------+------------------+------------------+
What I came up with so far
Looking at the line graphs I can already estimate that several lines will have a high correlation factor. Hence, I tried to use correlation analysis first on the grouped data. But is that the correct way? Basically I try now to make correlations on the averages rather then on the individual prices?
Is there an other way?
I am unsure which regression model fits here, as the prices do not move in any linear form and appear non-linear. Would I need to fit a model to each of price developments of an airline
PS: This is a long text-wall. If I need to clarify anything let me know. I am new to this sub.
Anyone a clue? :-)
data-mining dataset regression correlation visualization
$endgroup$
I want to investigate price-setting behavior of airlines -- specifically how airlines react to competitors pricing.
As I would say my knowledge about more complex analysis is quite limited I've done mostly all basic methods to gather a overall view of the data. This includes simple graphs which already help to identify similar patterns. I am also using SAS Enterprise 9.4.
However I am looking for a more number based approach.
Data Set
The (self) collected data set I am using contain around ~54.000 fares.
All fares were collected within a 60 day time window, on a daily basis (every night at 00:00).
Hence, every fare within that time window occurs $n$ times subject to the availability of the fare as well as the departure date of the flight, when it is passed by the collection date of the fare.
(You can't collect a fare for a flight when the departure date of the flight is in the past)
The unformatted that looks basically like this: (fake data)
+--------------------+-----------+--------------------+--------------------------+---------------+
| requestDate | price| tripStartDeparture | tripDestinationDeparture | flightCarrier |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 725.32 | 16APR2015:10:50:02 | 23APR2015:21:55:04 | XA |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 966.32 | 16APR2015:13:20:02 | 23APR2015:19:00:04 | XY |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 915.32 | 16APR2015:13:20:02 | 23APR2015:21:55:04 | XH |
+--------------------+-----------+--------------------+--------------------------+---------------+
"DaysBeforeDeparture" is calculated via $I=s-c$ where
- I & interval (days before departure)
- s & date of the fare (flight departure)
- c & date of which the fare was collected
Here is a example of grouped data set by I (DaysBeforeDep.) (fake data!):
+-----------------+------------------+------------------+------------------+------------------+
| DaysBefDeparture | AVG_of_sale | MIN_of_sale | MAX_of_sale | operatingCarrier |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 880.68 | 477.99 | 2,245.23 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 904.89 | 477.99 | 2,534.55 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 1,044.39 | 920.99 | 2,119.09 | LH |
+-----------------+------------------+------------------+------------------+------------------+
What I came up with so far
Looking at the line graphs I can already estimate that several lines will have a high correlation factor. Hence, I tried to use correlation analysis first on the grouped data. But is that the correct way? Basically I try now to make correlations on the averages rather then on the individual prices?
Is there an other way?
I am unsure which regression model fits here, as the prices do not move in any linear form and appear non-linear. Would I need to fit a model to each of price developments of an airline
PS: This is a long text-wall. If I need to clarify anything let me know. I am new to this sub.
Anyone a clue? :-)
data-mining dataset regression correlation visualization
data-mining dataset regression correlation visualization
edited Mar 31 at 19:49
Glorfindel
1511210
1511210
asked May 17 '15 at 20:12
s1xs1x
1618
1618
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.
Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.
Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.
Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.
In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.
Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.
What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.
There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.
I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.
$endgroup$
$begingroup$
Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
$endgroup$
– s1x
May 21 '15 at 13:16
$begingroup$
Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
$endgroup$
– s1x
May 21 '15 at 13:19
1
$begingroup$
@s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
$endgroup$
– habu
May 21 '15 at 13:24
add a comment |
$begingroup$
In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.
Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.
$endgroup$
$begingroup$
Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.:ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001
I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
$endgroup$
– s1x
May 18 '15 at 2:55
$begingroup$
@s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
$endgroup$
– Aleksandr Blekh
May 18 '15 at 3:21
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f5817%2fairline-fares-what-analysis-should-be-used-to-detect-competitive-price-setting%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.
Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.
Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.
Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.
In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.
Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.
What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.
There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.
I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.
$endgroup$
$begingroup$
Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
$endgroup$
– s1x
May 21 '15 at 13:16
$begingroup$
Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
$endgroup$
– s1x
May 21 '15 at 13:19
1
$begingroup$
@s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
$endgroup$
– habu
May 21 '15 at 13:24
add a comment |
$begingroup$
Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.
Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.
Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.
Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.
In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.
Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.
What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.
There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.
I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.
$endgroup$
$begingroup$
Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
$endgroup$
– s1x
May 21 '15 at 13:16
$begingroup$
Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
$endgroup$
– s1x
May 21 '15 at 13:19
1
$begingroup$
@s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
$endgroup$
– habu
May 21 '15 at 13:24
add a comment |
$begingroup$
Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.
Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.
Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.
Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.
In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.
Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.
What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.
There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.
I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.
$endgroup$
Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.
Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.
Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.
Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.
In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.
Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.
What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.
There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.
I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.
answered May 21 '15 at 12:51
habuhabu
1911
1911
$begingroup$
Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
$endgroup$
– s1x
May 21 '15 at 13:16
$begingroup$
Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
$endgroup$
– s1x
May 21 '15 at 13:19
1
$begingroup$
@s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
$endgroup$
– habu
May 21 '15 at 13:24
add a comment |
$begingroup$
Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
$endgroup$
– s1x
May 21 '15 at 13:16
$begingroup$
Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
$endgroup$
– s1x
May 21 '15 at 13:19
1
$begingroup$
@s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
$endgroup$
– habu
May 21 '15 at 13:24
$begingroup$
Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
$endgroup$
– s1x
May 21 '15 at 13:16
$begingroup$
Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
$endgroup$
– s1x
May 21 '15 at 13:16
$begingroup$
Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
$endgroup$
– s1x
May 21 '15 at 13:19
$begingroup$
Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
$endgroup$
– s1x
May 21 '15 at 13:19
1
1
$begingroup$
@s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
$endgroup$
– habu
May 21 '15 at 13:24
$begingroup$
@s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
$endgroup$
– habu
May 21 '15 at 13:24
add a comment |
$begingroup$
In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.
Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.
$endgroup$
$begingroup$
Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.:ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001
I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
$endgroup$
– s1x
May 18 '15 at 2:55
$begingroup$
@s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
$endgroup$
– Aleksandr Blekh
May 18 '15 at 3:21
add a comment |
$begingroup$
In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.
Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.
$endgroup$
$begingroup$
Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.:ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001
I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
$endgroup$
– s1x
May 18 '15 at 2:55
$begingroup$
@s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
$endgroup$
– Aleksandr Blekh
May 18 '15 at 3:21
add a comment |
$begingroup$
In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.
Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.
$endgroup$
In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.
Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.
edited Apr 13 '17 at 12:44
Community♦
1
1
answered May 18 '15 at 2:32
Aleksandr BlekhAleksandr Blekh
5,95811747
5,95811747
$begingroup$
Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.:ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001
I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
$endgroup$
– s1x
May 18 '15 at 2:55
$begingroup$
@s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
$endgroup$
– Aleksandr Blekh
May 18 '15 at 3:21
add a comment |
$begingroup$
Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.:ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001
I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
$endgroup$
– s1x
May 18 '15 at 2:55
$begingroup$
@s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
$endgroup$
– Aleksandr Blekh
May 18 '15 at 3:21
$begingroup$
Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.:
ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001
I assume such results indicate similar price patterns. With a regression analysis, what would I find out?$endgroup$
– s1x
May 18 '15 at 2:55
$begingroup$
Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.:
ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001
I assume such results indicate similar price patterns. With a regression analysis, what would I find out?$endgroup$
– s1x
May 18 '15 at 2:55
$begingroup$
@s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
$endgroup$
– Aleksandr Blekh
May 18 '15 at 3:21
$begingroup$
@s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
$endgroup$
– Aleksandr Blekh
May 18 '15 at 3:21
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f5817%2fairline-fares-what-analysis-should-be-used-to-detect-competitive-price-setting%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown