Airline Fares - What analysis should be used to detect competitive price-setting behavior and price correlations? The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhat initial steps should I use to make sense of large data sets, and what tools should I use?Obtain a model where each feature vector is past few samples and labels are future few samples?What is the difference between Slow Feature Analysis (SFA) and a Moving Average?Predicting future airfare using past dataPredict task durationWhat data treatment/transformation should be applied if there are a lot of outliers and features lack normal distribution?What are some data sources with strong positive and negative correlations in feature and sample space?What Unsupervised Machine Learning algorithms can be used to detect Fraud in insurance?Applying predictive maintenance to predict labor, what kind of test data should be used?

Would an alien lifeform be able to achieve space travel if lacking in vision?

Did the new image of black hole confirm the general theory of relativity?

Do warforged have souls?

Keeping a retro style to sci-fi spaceships?

Is this wall load bearing? Blueprints and photos attached

How to pronounce 1ターン?

How to copy the contents of all files with a certain name into a new file?

What aspect of planet Earth must be changed to prevent the industrial revolution?

Simulating Exploding Dice

Why does the Event Horizon Telescope (EHT) not include telescopes from Africa, Asia or Australia?

How did the audience guess the pentatonic scale in Bobby McFerrin's presentation?

When did F become S in typeography, and why?

Difference between "generating set" and free product?

Did God make two great lights or did He make the great light two?

Arduino Pro Micro - switch off LEDs

Do working physicists consider Newtonian mechanics to be "falsified"?

how can a perfect fourth interval be considered either consonant or dissonant?

Would it be possible to rearrange a dragon's flight muscle to somewhat circumvent the square-cube law?

Relations between two reciprocal partial derivatives?

Why can't devices on different VLANs, but on the same subnet, communicate?

What was the last x86 CPU that did not have the x87 floating-point unit built in?

Road tyres vs "Street" tyres for charity ride on MTB Tandem

Grover's algorithm - DES circuit as oracle?

Sort list of array linked objects by keys and values



Airline Fares - What analysis should be used to detect competitive price-setting behavior and price correlations?



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhat initial steps should I use to make sense of large data sets, and what tools should I use?Obtain a model where each feature vector is past few samples and labels are future few samples?What is the difference between Slow Feature Analysis (SFA) and a Moving Average?Predicting future airfare using past dataPredict task durationWhat data treatment/transformation should be applied if there are a lot of outliers and features lack normal distribution?What are some data sources with strong positive and negative correlations in feature and sample space?What Unsupervised Machine Learning algorithms can be used to detect Fraud in insurance?Applying predictive maintenance to predict labor, what kind of test data should be used?










12












$begingroup$


I want to investigate price-setting behavior of airlines -- specifically how airlines react to competitors pricing.



As I would say my knowledge about more complex analysis is quite limited I've done mostly all basic methods to gather a overall view of the data. This includes simple graphs which already help to identify similar patterns. I am also using SAS Enterprise 9.4.



However I am looking for a more number based approach.



Data Set



The (self) collected data set I am using contain around ~54.000 fares.
All fares were collected within a 60 day time window, on a daily basis (every night at 00:00).Collection Method



Hence, every fare within that time window occurs $n$ times subject to the availability of the fare as well as the departure date of the flight, when it is passed by the collection date of the fare.
(You can't collect a fare for a flight when the departure date of the flight is in the past)



The unformatted that looks basically like this: (fake data)



+--------------------+-----------+--------------------+--------------------------+---------------+
| requestDate | price| tripStartDeparture | tripDestinationDeparture | flightCarrier |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 725.32 | 16APR2015:10:50:02 | 23APR2015:21:55:04 | XA |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 966.32 | 16APR2015:13:20:02 | 23APR2015:19:00:04 | XY |
+--------------------+-----------+--------------------+--------------------------+---------------+
| 14APR2015:00:00:00 | 915.32 | 16APR2015:13:20:02 | 23APR2015:21:55:04 | XH |
+--------------------+-----------+--------------------+--------------------------+---------------+


"DaysBeforeDeparture" is calculated via $I=s-c$ where



  • I & interval (days before departure)

  • s & date of the fare (flight departure)

  • c & date of which the fare was collected

Here is a example of grouped data set by I (DaysBeforeDep.) (fake data!):



+-----------------+------------------+------------------+------------------+------------------+
| DaysBefDeparture | AVG_of_sale | MIN_of_sale | MAX_of_sale | operatingCarrier |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 880.68 | 477.99 | 2,245.23 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 904.89 | 477.99 | 2,534.55 | DL |
+-----------------+------------------+------------------+------------------+------------------+
| 0 | 1,044.39 | 920.99 | 2,119.09 | LH |
+-----------------+------------------+------------------+------------------+------------------+


What I came up with so far



Looking at the line graphs I can already estimate that several lines will have a high correlation factor. Hence, I tried to use correlation analysis first on the grouped data. But is that the correct way? Basically I try now to make correlations on the averages rather then on the individual prices?
Is there an other way?



I am unsure which regression model fits here, as the prices do not move in any linear form and appear non-linear. Would I need to fit a model to each of price developments of an airline



PS: This is a long text-wall. If I need to clarify anything let me know. I am new to this sub.



Anyone a clue? :-)










share|improve this question











$endgroup$
















    12












    $begingroup$


    I want to investigate price-setting behavior of airlines -- specifically how airlines react to competitors pricing.



    As I would say my knowledge about more complex analysis is quite limited I've done mostly all basic methods to gather a overall view of the data. This includes simple graphs which already help to identify similar patterns. I am also using SAS Enterprise 9.4.



    However I am looking for a more number based approach.



    Data Set



    The (self) collected data set I am using contain around ~54.000 fares.
    All fares were collected within a 60 day time window, on a daily basis (every night at 00:00).Collection Method



    Hence, every fare within that time window occurs $n$ times subject to the availability of the fare as well as the departure date of the flight, when it is passed by the collection date of the fare.
    (You can't collect a fare for a flight when the departure date of the flight is in the past)



    The unformatted that looks basically like this: (fake data)



    +--------------------+-----------+--------------------+--------------------------+---------------+
    | requestDate | price| tripStartDeparture | tripDestinationDeparture | flightCarrier |
    +--------------------+-----------+--------------------+--------------------------+---------------+
    | 14APR2015:00:00:00 | 725.32 | 16APR2015:10:50:02 | 23APR2015:21:55:04 | XA |
    +--------------------+-----------+--------------------+--------------------------+---------------+
    | 14APR2015:00:00:00 | 966.32 | 16APR2015:13:20:02 | 23APR2015:19:00:04 | XY |
    +--------------------+-----------+--------------------+--------------------------+---------------+
    | 14APR2015:00:00:00 | 915.32 | 16APR2015:13:20:02 | 23APR2015:21:55:04 | XH |
    +--------------------+-----------+--------------------+--------------------------+---------------+


    "DaysBeforeDeparture" is calculated via $I=s-c$ where



    • I & interval (days before departure)

    • s & date of the fare (flight departure)

    • c & date of which the fare was collected

    Here is a example of grouped data set by I (DaysBeforeDep.) (fake data!):



    +-----------------+------------------+------------------+------------------+------------------+
    | DaysBefDeparture | AVG_of_sale | MIN_of_sale | MAX_of_sale | operatingCarrier |
    +-----------------+------------------+------------------+------------------+------------------+
    | 0 | 880.68 | 477.99 | 2,245.23 | DL |
    +-----------------+------------------+------------------+------------------+------------------+
    | 0 | 904.89 | 477.99 | 2,534.55 | DL |
    +-----------------+------------------+------------------+------------------+------------------+
    | 0 | 1,044.39 | 920.99 | 2,119.09 | LH |
    +-----------------+------------------+------------------+------------------+------------------+


    What I came up with so far



    Looking at the line graphs I can already estimate that several lines will have a high correlation factor. Hence, I tried to use correlation analysis first on the grouped data. But is that the correct way? Basically I try now to make correlations on the averages rather then on the individual prices?
    Is there an other way?



    I am unsure which regression model fits here, as the prices do not move in any linear form and appear non-linear. Would I need to fit a model to each of price developments of an airline



    PS: This is a long text-wall. If I need to clarify anything let me know. I am new to this sub.



    Anyone a clue? :-)










    share|improve this question











    $endgroup$














      12












      12








      12


      1



      $begingroup$


      I want to investigate price-setting behavior of airlines -- specifically how airlines react to competitors pricing.



      As I would say my knowledge about more complex analysis is quite limited I've done mostly all basic methods to gather a overall view of the data. This includes simple graphs which already help to identify similar patterns. I am also using SAS Enterprise 9.4.



      However I am looking for a more number based approach.



      Data Set



      The (self) collected data set I am using contain around ~54.000 fares.
      All fares were collected within a 60 day time window, on a daily basis (every night at 00:00).Collection Method



      Hence, every fare within that time window occurs $n$ times subject to the availability of the fare as well as the departure date of the flight, when it is passed by the collection date of the fare.
      (You can't collect a fare for a flight when the departure date of the flight is in the past)



      The unformatted that looks basically like this: (fake data)



      +--------------------+-----------+--------------------+--------------------------+---------------+
      | requestDate | price| tripStartDeparture | tripDestinationDeparture | flightCarrier |
      +--------------------+-----------+--------------------+--------------------------+---------------+
      | 14APR2015:00:00:00 | 725.32 | 16APR2015:10:50:02 | 23APR2015:21:55:04 | XA |
      +--------------------+-----------+--------------------+--------------------------+---------------+
      | 14APR2015:00:00:00 | 966.32 | 16APR2015:13:20:02 | 23APR2015:19:00:04 | XY |
      +--------------------+-----------+--------------------+--------------------------+---------------+
      | 14APR2015:00:00:00 | 915.32 | 16APR2015:13:20:02 | 23APR2015:21:55:04 | XH |
      +--------------------+-----------+--------------------+--------------------------+---------------+


      "DaysBeforeDeparture" is calculated via $I=s-c$ where



      • I & interval (days before departure)

      • s & date of the fare (flight departure)

      • c & date of which the fare was collected

      Here is a example of grouped data set by I (DaysBeforeDep.) (fake data!):



      +-----------------+------------------+------------------+------------------+------------------+
      | DaysBefDeparture | AVG_of_sale | MIN_of_sale | MAX_of_sale | operatingCarrier |
      +-----------------+------------------+------------------+------------------+------------------+
      | 0 | 880.68 | 477.99 | 2,245.23 | DL |
      +-----------------+------------------+------------------+------------------+------------------+
      | 0 | 904.89 | 477.99 | 2,534.55 | DL |
      +-----------------+------------------+------------------+------------------+------------------+
      | 0 | 1,044.39 | 920.99 | 2,119.09 | LH |
      +-----------------+------------------+------------------+------------------+------------------+


      What I came up with so far



      Looking at the line graphs I can already estimate that several lines will have a high correlation factor. Hence, I tried to use correlation analysis first on the grouped data. But is that the correct way? Basically I try now to make correlations on the averages rather then on the individual prices?
      Is there an other way?



      I am unsure which regression model fits here, as the prices do not move in any linear form and appear non-linear. Would I need to fit a model to each of price developments of an airline



      PS: This is a long text-wall. If I need to clarify anything let me know. I am new to this sub.



      Anyone a clue? :-)










      share|improve this question











      $endgroup$




      I want to investigate price-setting behavior of airlines -- specifically how airlines react to competitors pricing.



      As I would say my knowledge about more complex analysis is quite limited I've done mostly all basic methods to gather a overall view of the data. This includes simple graphs which already help to identify similar patterns. I am also using SAS Enterprise 9.4.



      However I am looking for a more number based approach.



      Data Set



      The (self) collected data set I am using contain around ~54.000 fares.
      All fares were collected within a 60 day time window, on a daily basis (every night at 00:00).Collection Method



      Hence, every fare within that time window occurs $n$ times subject to the availability of the fare as well as the departure date of the flight, when it is passed by the collection date of the fare.
      (You can't collect a fare for a flight when the departure date of the flight is in the past)



      The unformatted that looks basically like this: (fake data)



      +--------------------+-----------+--------------------+--------------------------+---------------+
      | requestDate | price| tripStartDeparture | tripDestinationDeparture | flightCarrier |
      +--------------------+-----------+--------------------+--------------------------+---------------+
      | 14APR2015:00:00:00 | 725.32 | 16APR2015:10:50:02 | 23APR2015:21:55:04 | XA |
      +--------------------+-----------+--------------------+--------------------------+---------------+
      | 14APR2015:00:00:00 | 966.32 | 16APR2015:13:20:02 | 23APR2015:19:00:04 | XY |
      +--------------------+-----------+--------------------+--------------------------+---------------+
      | 14APR2015:00:00:00 | 915.32 | 16APR2015:13:20:02 | 23APR2015:21:55:04 | XH |
      +--------------------+-----------+--------------------+--------------------------+---------------+


      "DaysBeforeDeparture" is calculated via $I=s-c$ where



      • I & interval (days before departure)

      • s & date of the fare (flight departure)

      • c & date of which the fare was collected

      Here is a example of grouped data set by I (DaysBeforeDep.) (fake data!):



      +-----------------+------------------+------------------+------------------+------------------+
      | DaysBefDeparture | AVG_of_sale | MIN_of_sale | MAX_of_sale | operatingCarrier |
      +-----------------+------------------+------------------+------------------+------------------+
      | 0 | 880.68 | 477.99 | 2,245.23 | DL |
      +-----------------+------------------+------------------+------------------+------------------+
      | 0 | 904.89 | 477.99 | 2,534.55 | DL |
      +-----------------+------------------+------------------+------------------+------------------+
      | 0 | 1,044.39 | 920.99 | 2,119.09 | LH |
      +-----------------+------------------+------------------+------------------+------------------+


      What I came up with so far



      Looking at the line graphs I can already estimate that several lines will have a high correlation factor. Hence, I tried to use correlation analysis first on the grouped data. But is that the correct way? Basically I try now to make correlations on the averages rather then on the individual prices?
      Is there an other way?



      I am unsure which regression model fits here, as the prices do not move in any linear form and appear non-linear. Would I need to fit a model to each of price developments of an airline



      PS: This is a long text-wall. If I need to clarify anything let me know. I am new to this sub.



      Anyone a clue? :-)







      data-mining dataset regression correlation visualization






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 31 at 19:49









      Glorfindel

      1511210




      1511210










      asked May 17 '15 at 20:12









      s1xs1x

      1618




      1618




















          2 Answers
          2






          active

          oldest

          votes


















          9












          $begingroup$

          Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.



          Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.



          Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.



          Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.



          In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.



          Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.



          What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.



          There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.



          I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.






          share|improve this answer









          $endgroup$












          • $begingroup$
            Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
            $endgroup$
            – s1x
            May 21 '15 at 13:16










          • $begingroup$
            Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
            $endgroup$
            – s1x
            May 21 '15 at 13:19






          • 1




            $begingroup$
            @s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
            $endgroup$
            – habu
            May 21 '15 at 13:24


















          4












          $begingroup$

          In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.



          Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.






          share|improve this answer











          $endgroup$












          • $begingroup$
            Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.: ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001 I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
            $endgroup$
            – s1x
            May 18 '15 at 2:55











          • $begingroup$
            @s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
            $endgroup$
            – Aleksandr Blekh
            May 18 '15 at 3:21











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f5817%2fairline-fares-what-analysis-should-be-used-to-detect-competitive-price-setting%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          9












          $begingroup$

          Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.



          Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.



          Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.



          Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.



          In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.



          Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.



          What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.



          There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.



          I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.






          share|improve this answer









          $endgroup$












          • $begingroup$
            Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
            $endgroup$
            – s1x
            May 21 '15 at 13:16










          • $begingroup$
            Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
            $endgroup$
            – s1x
            May 21 '15 at 13:19






          • 1




            $begingroup$
            @s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
            $endgroup$
            – habu
            May 21 '15 at 13:24















          9












          $begingroup$

          Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.



          Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.



          Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.



          Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.



          In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.



          Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.



          What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.



          There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.



          I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.






          share|improve this answer









          $endgroup$












          • $begingroup$
            Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
            $endgroup$
            – s1x
            May 21 '15 at 13:16










          • $begingroup$
            Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
            $endgroup$
            – s1x
            May 21 '15 at 13:19






          • 1




            $begingroup$
            @s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
            $endgroup$
            – habu
            May 21 '15 at 13:24













          9












          9








          9





          $begingroup$

          Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.



          Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.



          Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.



          Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.



          In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.



          Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.



          What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.



          There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.



          I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.






          share|improve this answer









          $endgroup$



          Word of warning from a former airline Revenue Management analyst: you might be barking up the wrong tree with this approach. Apologies for the wall of text that follows, but this data is a lot more complex and noisy than might appear at first glance, so wanted to provide a short description of how it's generated; forewarned is forearmed.



          Airline fares have two components to them: all the actual fares (complete with fare rules and what have you) that an airline has available for a certain route, most of which are published the Airline Tariff Publishing Company (a few special-use ones are not, but those are the exception rather than the rule) and the actual inventory management performed by the airline on a day-to-day basis.



          Fares can be submitted to ATPCO four times a day, at set intervals, and when airlines do so, it will usually consist of a mixture of additions, deletions, and modifications of existing fares. When an airline initiates a pricing action (assuming their competitors aren't trying to make their own moves here), they usually have to wait until the next update to see if their competitors follow/respond. The converse goes when a competitor initiates a pricing action, as the airline has to wait until the next update before they can respond.



          Now, this is all well and good with respect to fares, but the problem is that, because this is all getting published in ATPCO, fares are the next best thing to public information... all your competitors get to see what you've got in your arsenal, so attempts to obfuscate are not unheard of, such as publishing fares that will never actually be assigned any inventory, listing all the fares as day-of-departure, etc.



          In many ways, the secret sauce comes down to the actual inventory allocation, i.e. how many seats on each flight will you be willing to sell for a given fare, and this information is not publicly available. You can get some glimpses by scraping web info, but the potential combinations of departure time/date and fare rules are quite numerous and may quickly escalate beyond your ability to easily keep track of.



          Typically an airline will only be willing to sell a handful of seats for a very low fare and the people who snag those have to book quite far in advance lest the fare rules lock them out, or other travelers simply beat them to the punch. The airline will be willing to sell a few more seats for a higher fare, and so on and so forth. They will be quite happy to sell all of the seats for the highest fare they've got published, but this is not usually feasible.



          What you're seeing with fares getting higher the closer you get to the day of departure is simply the natural process of having the cheap seats get booked farther out, while the remaining inventory gradually gets more expensive. Of course, there are some caveats here. The RM process is actively managed and human intervention is quite common as the RM team generally strives to meet its revenue goals and maximize revenue on each flight. As such, flights that fill up quickly may be "tightened up" by closing out low fares. Flights that are booking slowly may be "loosened up" by allocating more seats to lower fares.



          There is a constant interplay and competition between airlines in this area, but you are not very likely to capture the actual dynamics just from scraping fares. Don't get me wrong, we had such tools at our disposal, and, despite their limitations, they were quite valuable, but they were just one data source that fed into the decision-making process. You'd need access to the hundreds, if not thousands of operational decisions made by RM teams on a daily basis, as well as state-of-the-world information as they see it at the time. If you cannot find an airline partner to work with in order to get this data, you might need to consider alternate data sources.



          I'd recommend looking into getting access to O&D fare data from the Official Airline Guide (or one of their competitors) and try to use that for your analysis. It's sample-based (about 10% of all tickets sold) and aggregated at a higher level than would be ideal so careful route selection is imperative (I'd recommend something with plenty of airlines, flying non-stop multiple times a day, with large aircraft), but you may be able to get a better picture of what was actually sold (average fare) and how much of it was sold (load factor), vs. merely what is available for sale at a given point in time. Using that information you might be in better position to at least explore the outcomes of the airlines' pricing strategy, and make your inferences from there.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered May 21 '15 at 12:51









          habuhabu

          1911




          1911











          • $begingroup$
            Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
            $endgroup$
            – s1x
            May 21 '15 at 13:16










          • $begingroup$
            Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
            $endgroup$
            – s1x
            May 21 '15 at 13:19






          • 1




            $begingroup$
            @s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
            $endgroup$
            – habu
            May 21 '15 at 13:24
















          • $begingroup$
            Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
            $endgroup$
            – s1x
            May 21 '15 at 13:16










          • $begingroup$
            Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
            $endgroup$
            – s1x
            May 21 '15 at 13:19






          • 1




            $begingroup$
            @s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
            $endgroup$
            – habu
            May 21 '15 at 13:24















          $begingroup$
          Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
          $endgroup$
          – s1x
          May 21 '15 at 13:16




          $begingroup$
          Thanks for your thorough explanation. I agree with you that such analysis based on prices only are quite limited. This also includes notably fare rules (Refundable tickets, minimum stay etc.) Some of those limitation can be overcome by collecting always same fares to make the comparable. However, a important information - as you mentioned, is missing the amount of seats available (can be != seats in a plane) and the the actually amount of sold tickets.
          $endgroup$
          – s1x
          May 21 '15 at 13:16












          $begingroup$
          Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
          $endgroup$
          – s1x
          May 21 '15 at 13:19




          $begingroup$
          Access to such data is very limited and if - outdated (eg. Databank 1B from US DOT). Some research such as Clark R. and Vincent N. (2012) Capacity-contingent pricing [...] link includes such data and offer much better insights. I'am aware of the limitations (hopefully ;-) ) and as you mentioned as there are much more information influencing prices. Still when observing a specific market you can get a feeling of what happens. You can see if there is any compeitive behaviour and different pricing strategy approachs. However, you would never be able to find the cause.
          $endgroup$
          – s1x
          May 21 '15 at 13:19




          1




          1




          $begingroup$
          @s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
          $endgroup$
          – habu
          May 21 '15 at 13:24




          $begingroup$
          @s1x - I agree and I wish I had a solid alternative to offer, but, as you've learned yourself, detailed revenue data is the most jealously guarded secret at any airline. Just wanted to make sure you're aware of that and what goes into the data generation process. Beyond, that, I like what you're trying to do and I think the other answer is a step in the right direction, technique-wise. If I might suggest, you could also take a look at using cross-correlation between your various TS during your data exploration, as it is often valuable for discerning patterns between linked TS.
          $endgroup$
          – habu
          May 21 '15 at 13:24











          4












          $begingroup$

          In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.



          Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.






          share|improve this answer











          $endgroup$












          • $begingroup$
            Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.: ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001 I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
            $endgroup$
            – s1x
            May 18 '15 at 2:55











          • $begingroup$
            @s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
            $endgroup$
            – Aleksandr Blekh
            May 18 '15 at 3:21















          4












          $begingroup$

          In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.



          Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.






          share|improve this answer











          $endgroup$












          • $begingroup$
            Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.: ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001 I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
            $endgroup$
            – s1x
            May 18 '15 at 2:55











          • $begingroup$
            @s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
            $endgroup$
            – Aleksandr Blekh
            May 18 '15 at 3:21













          4












          4








          4





          $begingroup$

          In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.



          Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.






          share|improve this answer











          $endgroup$



          In addition to exploratory data analysis (EDA), both descriptive and visual, I would try to use time series analysis as a more comprehensive and sophisticated analysis. Specifically, I would perform time series regression analysis. Time series analysis is a huge research and practice domain, so, if you're not familiar with the fundamentals, I suggest starting with the above-linked Wikipedia article, gradually searching for more specific topics and reading corresponding articles, papers and books.



          Since time series analysis is a very popular approach, it is supported by most open source and closed source commercial data science and statistical environments (software), such as R, Python, SAS, SPSS and many others. If you want to use R for this, check my answers on general time series analysis and on time series classification and clustering. I hope that this is helpful.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 13 '17 at 12:44









          Community

          1




          1










          answered May 18 '15 at 2:32









          Aleksandr BlekhAleksandr Blekh

          5,95811747




          5,95811747











          • $begingroup$
            Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.: ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001 I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
            $endgroup$
            – s1x
            May 18 '15 at 2:55











          • $begingroup$
            @s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
            $endgroup$
            – Aleksandr Blekh
            May 18 '15 at 3:21
















          • $begingroup$
            Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.: ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001 I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
            $endgroup$
            – s1x
            May 18 '15 at 2:55











          • $begingroup$
            @s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
            $endgroup$
            – Aleksandr Blekh
            May 18 '15 at 3:21















          $begingroup$
          Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.: ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001 I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
          $endgroup$
          – s1x
          May 18 '15 at 2:55





          $begingroup$
          Thank you for your answer @Aleksandr Blekh - really appreciated. Ill digg right into that. Maybe a stupid question, but please correct me here if I'am wrong here:a correlation analysis, while using one airline as the variable to correlate with. The results were compelling so far, as some airlines espc. those who had codeshare agreements had similar prices. Would such high correlations e.g.: ColumnUA(LH) 0.90435 <.0001 ColumnSQ 0.32544 <.0001 ColumnAF(DL) 0.55336 <.0001 I assume such results indicate similar price patterns. With a regression analysis, what would I find out?
          $endgroup$
          – s1x
          May 18 '15 at 2:55













          $begingroup$
          @s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
          $endgroup$
          – Aleksandr Blekh
          May 18 '15 at 3:21




          $begingroup$
          @s1x: You're very welcome (feel free to upvote/accept, if you value the answer and when you'll get enough reputation to do so, of course). Now, on to your question. As I said, TS analysis is more sophisticated and comprehensive. In particular TS regression, accounts for so-called autoregression and other TS complexities. Hence, my suggestion to use TS regression analysis instead of simpler traditional one. Also, you should always start with EDA, no matter what data analysis you plan to perform (actually, EDA will often change your plans).
          $endgroup$
          – Aleksandr Blekh
          May 18 '15 at 3:21

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f5817%2fairline-fares-what-analysis-should-be-used-to-detect-competitive-price-setting%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High