What does Linus Torvalds mean when he says that Git “never ever” tracks a file?View the change history of a file using Git versioningHow does git matches blobs to files across commit trees?Is git supposed to delete empty directories?Git workflow and rebase vs merge questionsHow to stop tracking and ignore changes to a file in Git?How to make Git “forget” about a file that was tracked but is now in .gitignore?In plain English, what does “git reset” do?Handling file renames in gitMessage 'src refspec master does not match any' when pushing commits in GitFind when a file was deleted in GitWhat does the term “porcelain” mean in Git?What does cherry-picking a commit with Git mean?Various ways to remove local Git changes

Did we get closer to another plane than we were supposed to, or was the pilot just protecting our delicate sensibilities?

exec command in bash loop

How can I roleplay a follower-type character when I as a player have a leader-type personality?

Can hackers enable the camera after the user disabled it?

Would glacier 'trees' be plausible?

Adjusting layout of footer using fancyhdr

What was the first sci-fi story to feature the plot "the humans were the monsters all along"?

Why do people keep telling me that I am a bad photographer?

Nominativ or Akkusativ

As matter approaches a black hole, does it speed up?

How can I preview an image in its original size?

I'm in your subnets, golfing your code

What to use instead of cling film to wrap pastry

How do LIGO and VIRGO know that a gravitational wave has its origin in a neutron star or a black hole?

As a Bard multi-classing into Warlock, what spells do I get?

How can I support myself financially as a 17 year old with a loan?

What is the most remote airport from the center of the city it supposedly serves?

Out of scope work duties and resignation

How does this change to the opportunity attack rule impact combat?

How to safely wipe a USB flash drive

How should I tell my manager I'm not paying for an optional after work event I'm not going to?

Missing Piece of Pie - Can you find it?

What is a smasher?

Upside-Down Pyramid Addition...REVERSED!



What does Linus Torvalds mean when he says that Git “never ever” tracks a file?


View the change history of a file using Git versioningHow does git matches blobs to files across commit trees?Is git supposed to delete empty directories?Git workflow and rebase vs merge questionsHow to stop tracking and ignore changes to a file in Git?How to make Git “forget” about a file that was tracked but is now in .gitignore?In plain English, what does “git reset” do?Handling file renames in gitMessage 'src refspec master does not match any' when pushing commits in GitFind when a file was deleted in GitWhat does the term “porcelain” mean in Git?What does cherry-picking a commit with Git mean?Various ways to remove local Git changes






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








262















Quoting Linus Torvalds when asked how many files Git can handle during his Tech Talk at Google in 2007 (43:09):




…Git tracks your content. It never ever tracks a single file. You cannot track a file in Git. What you can do is you can track a project that has a single file, but if your project has a single file, sure do that and you can do it, but if you track 10,000 files, Git never ever sees those as individual files. Git thinks everything as the full content. All history in Git is based on the history of the whole project…




(Transcripts here.)



Yet, when you dive into the Git book, the first thing you are told is that a file in Git can be either tracked or untracked. Furthermore, it seems to me like the whole Git experience is geared towards file versioning. When using git diff or git status output is presented on a per file basis. When using git add you also get to choose on a per file basis. You can even review history on a file basis and is lightning fast.



How should this statement be interpreted? In terms of file tracking, how is Git different from other source control systems, such as CVS?










share|improve this question



















  • 20





    reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

    – user2864740
    Apr 9 at 23:47







  • 5





    Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

    – Elliott Frisch
    Apr 9 at 23:52







  • 12





    @ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

    – melpomene
    Apr 10 at 0:02






  • 4





    I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

    – ElpieKay
    Apr 10 at 0:20







  • 3





    Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

    – Peter Mortensen
    Apr 10 at 2:07


















262















Quoting Linus Torvalds when asked how many files Git can handle during his Tech Talk at Google in 2007 (43:09):




…Git tracks your content. It never ever tracks a single file. You cannot track a file in Git. What you can do is you can track a project that has a single file, but if your project has a single file, sure do that and you can do it, but if you track 10,000 files, Git never ever sees those as individual files. Git thinks everything as the full content. All history in Git is based on the history of the whole project…




(Transcripts here.)



Yet, when you dive into the Git book, the first thing you are told is that a file in Git can be either tracked or untracked. Furthermore, it seems to me like the whole Git experience is geared towards file versioning. When using git diff or git status output is presented on a per file basis. When using git add you also get to choose on a per file basis. You can even review history on a file basis and is lightning fast.



How should this statement be interpreted? In terms of file tracking, how is Git different from other source control systems, such as CVS?










share|improve this question



















  • 20





    reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

    – user2864740
    Apr 9 at 23:47







  • 5





    Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

    – Elliott Frisch
    Apr 9 at 23:52







  • 12





    @ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

    – melpomene
    Apr 10 at 0:02






  • 4





    I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

    – ElpieKay
    Apr 10 at 0:20







  • 3





    Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

    – Peter Mortensen
    Apr 10 at 2:07














262












262








262


46






Quoting Linus Torvalds when asked how many files Git can handle during his Tech Talk at Google in 2007 (43:09):




…Git tracks your content. It never ever tracks a single file. You cannot track a file in Git. What you can do is you can track a project that has a single file, but if your project has a single file, sure do that and you can do it, but if you track 10,000 files, Git never ever sees those as individual files. Git thinks everything as the full content. All history in Git is based on the history of the whole project…




(Transcripts here.)



Yet, when you dive into the Git book, the first thing you are told is that a file in Git can be either tracked or untracked. Furthermore, it seems to me like the whole Git experience is geared towards file versioning. When using git diff or git status output is presented on a per file basis. When using git add you also get to choose on a per file basis. You can even review history on a file basis and is lightning fast.



How should this statement be interpreted? In terms of file tracking, how is Git different from other source control systems, such as CVS?










share|improve this question
















Quoting Linus Torvalds when asked how many files Git can handle during his Tech Talk at Google in 2007 (43:09):




…Git tracks your content. It never ever tracks a single file. You cannot track a file in Git. What you can do is you can track a project that has a single file, but if your project has a single file, sure do that and you can do it, but if you track 10,000 files, Git never ever sees those as individual files. Git thinks everything as the full content. All history in Git is based on the history of the whole project…




(Transcripts here.)



Yet, when you dive into the Git book, the first thing you are told is that a file in Git can be either tracked or untracked. Furthermore, it seems to me like the whole Git experience is geared towards file versioning. When using git diff or git status output is presented on a per file basis. When using git add you also get to choose on a per file basis. You can even review history on a file basis and is lightning fast.



How should this statement be interpreted? In terms of file tracking, how is Git different from other source control systems, such as CVS?







git version-control






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 12 at 2:06







Simón Ramírez Amaya

















asked Apr 9 at 23:40









Simón Ramírez AmayaSimón Ramírez Amaya

1,2302816




1,2302816







  • 20





    reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

    – user2864740
    Apr 9 at 23:47







  • 5





    Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

    – Elliott Frisch
    Apr 9 at 23:52







  • 12





    @ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

    – melpomene
    Apr 10 at 0:02






  • 4





    I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

    – ElpieKay
    Apr 10 at 0:20







  • 3





    Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

    – Peter Mortensen
    Apr 10 at 2:07













  • 20





    reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

    – user2864740
    Apr 9 at 23:47







  • 5





    Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

    – Elliott Frisch
    Apr 9 at 23:52







  • 12





    @ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

    – melpomene
    Apr 10 at 0:02






  • 4





    I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

    – ElpieKay
    Apr 10 at 0:20







  • 3





    Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

    – Peter Mortensen
    Apr 10 at 2:07








20




20





reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

– user2864740
Apr 9 at 23:47






reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

– user2864740
Apr 9 at 23:47





5




5





Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

– Elliott Frisch
Apr 9 at 23:52






Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

– Elliott Frisch
Apr 9 at 23:52





12




12





@ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

– melpomene
Apr 10 at 0:02





@ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

– melpomene
Apr 10 at 0:02




4




4





I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

– ElpieKay
Apr 10 at 0:20






I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

– ElpieKay
Apr 10 at 0:20





3




3





Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

– Peter Mortensen
Apr 10 at 2:07






Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

– Peter Mortensen
Apr 10 at 2:07













6 Answers
6






active

oldest

votes


















299














In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with its own version number. CVS was based on RCS (Revision Control System), which tracked individual files in a similar way.



On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.






share|improve this answer




















  • 4





    You could add that this is why in CVS and Subversion, you can use tags like $Id$ in a file. The same does not work in git, because the design is different.

    – gerrit
    Apr 10 at 8:01






  • 54





    And content is not bound to a file as you would expect. Try moving 80% of the code of one file to another. Git automatically detects a file move + 20% change, even when you just moved code around in existing files.

    – allo
    Apr 10 at 12:55






  • 13





    @allo As a side-effect of that, git can do one thing the others can't: when two files are merged and you use "git blame -C", git can look down both histories. In file-based tracking, you have to pick which of the original files is the real original, and the other lines all appear brand-new.

    – Izkata
    Apr 10 at 21:05






  • 1





    @allo, Izkata - And it's the querying entity that works all this out by analysing the repo contents at query time (commit histories and differences between referenced trees and blobs), rather than requiring the committing entity and its human user to correctly specify or synthesise this information at commit time - nor the repo tool developer to design & implement this capability and the corresponding metadata schema before the tool is deployed. Torvalds argued that such analysis will only get better over time, and all history of every git repo since day one will benefit.

    – Jeremy
    Apr 11 at 12:47












  • @allo Yep, and to hammer home the fact that git doesn't work on a file level, you don't even have to commit all the changes in a file at once; you can commit arbitrary ranges of lines while leaving other changes in the file outside of the commit. Of course the UI for that is not nearly as simple so most don't do it, but it does rarely have its uses.

    – Alvin Thompson
    Apr 17 at 2:37


















101














I agree with brian m. carlson's answer: Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



When you ask Git to show you a file's history using:



git log [--follow] [starting-point] [--] path/to/file


what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



  • the commit is a non-merge commit, and

  • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

(but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!






share|improve this answer

























  • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

    – Wes Toleman
    Apr 10 at 3:42











  • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

    – torek
    Apr 10 at 5:32











  • @torek I have a doubt regarding your description about Git answering a file history request but I think it deserves its own proper question: stackoverflow.com/questions/55616349/…

    – Simón Ramírez Amaya
    Apr 10 at 15:34


















14














The confusing bit is here:




Git never ever sees those as individual files. Git thinks everything as the full content.




Git often uses 160 bit hashes in place of objects in its own repo. A tree of files is basically a list of names and hashes associated with the content of each (plus some metadata).



But the 160 bit hash uniquely identifies the content (within the universe of the git database). So a tree with hashes as content includes the content in its state.



If you change the state of the content of a file, its hash changes. But if its hash changes, the hash associated with the file name's content also changes. Which in turn changes the hash of the "directory tree".



When a git database stores a directory tree, that directory tree implies and includes all of the content of all of the subdirectories and all of the files in it.



It is organized in a tree structure with (immutable, reusable) pointers to blobs or other trees, but logically it is a single snapshot of the entire content of the entire tree. The representation in the git database isn't the flat data contents, but logically it is all of its data and nothing else.



If you serialized the tree to a filesystem, deleted all .git folders, and told git to add the tree back into its database, you'd end up with adding nothing to the database -- the element would already be there.



It may help to think of git's hashes as a reference counted pointer to immutable data.



If you built an application around that, a document is a bunch of pages, which have layers, which have groups, which have objects.



When you want to change an object, you have to create a completely new group for it. If you want to change a group, you have to create a new layer, which needs a new page, which needs a new document.



Every time you change a single object, it spawns a new document. The old document continues to exist. The new and old document share most of their content -- they have the same pages (except 1). That one page has the same layers (except 1). That layer has the same groups (except 1). That group has the same objects (except 1).



And by same, I mean logically a copy, but implementation-wise it is just another reference counted pointer to the same immutable object.



A git repo is a lot like that.



This means that a given git changeset contains its commit message (as a hash code), it contains its work tree, and it contains its parent changes.



Those parent changes contain their parent changes, all the way back.



The part of the git repo that contains history is that chain of changes. That chain of changes it at a level above the "directory" tree -- from a "directory" tree, you cannot uniquely get to a change set and the chain of changes.



To find out what happens to a file, you start with that file in a changeset. That changeset has a history. Often in that history, the same named file exists, sometimes with the same content. If the content is the same, there was no change to the file. If it is different, there is a change, and work needs to be done to work out exactly what.



Sometimes the file is gone; but, the "directory" tree might have another file with the same content (same hash code), so we can track it that way (note; this is why you want a commit-to-move a file separate from a commit-to-edit). Or the same file name, and after checking the file is similar enough.



So git can patchwork together a "file history".



But this file history comes from efficient parsing of the "entire changeset", not from a link from one version of the file to another.






share|improve this answer






























    12














    "git does not track files" basically means that git's commits consist of a file tree snapshot connecting a path in the tree to a "blob" and a commit graph tracking the history of commits. Everything else is reconstructed on-the-fly by commands like "git log" and "git blame". This reconstruction can be told via various options how hard it should look for file-based changes. The default heuristics can determine when a blob changes place in the file tree without change, or when a file is associated with a different blob than before. The compression mechanisms Git uses don't care a whole lot about blob/file boundaries. If the content is somewhere already, this will keep the repository growth small without associating the various blobs.



    Now that is the repository. Git also has a working tree, and in this working tree there are tracked and untracked files. Only the tracked files are recorded in the index (staging area? cache?) and only what is tracked there makes it into the repository.



    The index is file-oriented and there are some file-oriented commands for manipulating it. But what ends up in the repository is just commits in the form of file tree snapshots and the associated blob data and the commit's ancestors.



    Since Git does not track file histories and renames and its efficiency does not depend on them, sometimes you have to try a few times with different options until Git produces the history/diffs/blames you are interested in for non-trivial histories.



    That's different with systems like Subversion which record rather than reconstruct histories. If it's not on record, you don't get to hear about it.



    I actually built a differential installer at one time that just compared release trees by checking them into Git and then producing a script duplicating their effect. Since sometimes whole trees were moved, this produced much smaller differential installers than overwriting/deleting everything would have produced.






    share|improve this answer






























      7














      Git doesn't track a file directly, but tracks snapshots of the repository, and these snapshots happen to consist of files.



      Here's a way to look at it.



      In other version control systems (SVN, Rational ClearCase), you can right click on a file and get its change history.



      In Git, there is no direct command that does this. See this question. You'll be surprised at how many different answers there are. There is no one simple answer because Git doesn't simply track a file, not in the way that SVN or ClearCase does it.






      share|improve this answer


















      • 5





        I think I get what you're trying to say, but "In Git, there is no direct command that does this" is directly contradicted by the answers to the question you've linked to. While it's true that versioning happens at the level of the whole repository, there are typically loads of ways to achieve anything in Git, so having multiple commands to show a file's history isn't evidence of much.

        – Joe Lee-Moyet
        Apr 10 at 9:55












      • I skimmed the first few answers of the question you linked and all of them use git log or some program built on top of that (or some alias that does the same thing). But even if there were lots of different ways, as Joe says that's also true for showing branch history. (also git log -p <file> is built in and does exactly that)

        – Voo
        Apr 10 at 11:21












      • Are you sure that SVN internally stores changes per file? I haven't used it in some time already, but I vaguely remember having files named like version ids, rather than reflection of project file structure.

        – Artur Biesiadowski
        Apr 11 at 9:36


















      3














      Tracking "content", incidentally, is what led to not track empty directories.

      That is why, if you git rm the last file of a folder, the folder itself gets deleted.



      That wasn't always the case, and only Git 1.4 (May 2006) enforced that "tracking content" policy with commit 443f833:




      git status: skip empty directories, and add -u to show all untracked files



      By default, we use --others --directory to show uninteresting directories (to get user's attention) without their contents (to unclutter output).

      Showing empty directories do not make sense, so pass --no-empty-directory when we do so.



      Giving -u (or --untracked) disables this uncluttering to let the
      user get all untracked files.




      That was echoed years later in Jan. 2011 with commit 8fe533, Git v1.7.4:




      This is in keeping with the general UI philosophy: git tracks content, not empty directories.




      In the meantime, with Git 1.4.3 (Sept. 2006), Git starts limiting untracked content to non-empty folders, with commit 2074cb0:




      it should not list the contents of completely untracked directories, but only the name of that directory (plus a trailing '/').




      Tracking content is what allowed git blame to, very early on (Git 1.4.4, Oct. 2006, commit cee7f24) be more performant:




      More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit.




      That (tracking content) is also what put git add in the Git API, with Git 1.5.0 (Dec. 2006, commit 366bfcb)




      make 'git add' a first class user friendly interface to the index



      This brings the power of the index up front using a proper mental model without talking about the index at all.

      See for example how all the technical discussion has been evacuated from the git-add man page.




      Any content to be committed must be added together.

      Whether that content comes from new files or modified files doesn't matter.

      You just need to "add" it, either with git-add, or by providing git-commit with -a (for already known files only of course).





      That is what made git add --interactive possible, with the same Git 1.5.0 (commit 5cde71d)




      After making the selection, answer with an empty line to stage the contents of working tree files for selected paths in the index.




      That is also why, to recursively remove all contents from a directory, you need to pass -r option, not just the directory name as the <path> (still Git 1.5.0, commit 9f95069).



      Seeing file content instead of file itself is what allows merge scenario like the one described in commit 1de70db (Git v2.18.0-rc0, Apr. 2018)




      Consider the following merge with a rename/add conflict:



      • side A: modify foo, add unrelated bar

      • side B: rename foo->bar (but don't modify the mode or contents)

      In this case, the three-way merge of original foo, A's foo, and B's bar will result in a desired pathname of bar with the same mode/contents that A had for foo.

      Thus, A had the right mode and contents for the file, and it had the right pathname present (namely, bar).




      Commit 37b65ce, Git v2.21.0-rc0, Dec. 2018, recently improved colliding conflict resolutions.

      And commit bbafc9c firther illustrates the importance of considering file content, by improving the handling for rename/rename(2to1) conflicts:




      • Instead of storing files at collide_path~HEAD and collide_path~MERGE, the files are two-way merged and recorded at collide_path.

      • Instead of recording the version of the renamed file that existed on the renamed side in the index (thus ignoring any changes that were made to the file on the side of history without the rename), we do a three-way content merge on the renamed path, then store that at either stage 2 or stage 3.

      • Note that since the content merge for each rename may have conflicts, and then we have to merge the two renamed files, we can end up with nested conflict markers.






      share|improve this answer























        Your Answer






        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55602748%2fwhat-does-linus-torvalds-mean-when-he-says-that-git-never-ever-tracks-a-file%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        299














        In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with its own version number. CVS was based on RCS (Revision Control System), which tracked individual files in a similar way.



        On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



        When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.






        share|improve this answer




















        • 4





          You could add that this is why in CVS and Subversion, you can use tags like $Id$ in a file. The same does not work in git, because the design is different.

          – gerrit
          Apr 10 at 8:01






        • 54





          And content is not bound to a file as you would expect. Try moving 80% of the code of one file to another. Git automatically detects a file move + 20% change, even when you just moved code around in existing files.

          – allo
          Apr 10 at 12:55






        • 13





          @allo As a side-effect of that, git can do one thing the others can't: when two files are merged and you use "git blame -C", git can look down both histories. In file-based tracking, you have to pick which of the original files is the real original, and the other lines all appear brand-new.

          – Izkata
          Apr 10 at 21:05






        • 1





          @allo, Izkata - And it's the querying entity that works all this out by analysing the repo contents at query time (commit histories and differences between referenced trees and blobs), rather than requiring the committing entity and its human user to correctly specify or synthesise this information at commit time - nor the repo tool developer to design & implement this capability and the corresponding metadata schema before the tool is deployed. Torvalds argued that such analysis will only get better over time, and all history of every git repo since day one will benefit.

          – Jeremy
          Apr 11 at 12:47












        • @allo Yep, and to hammer home the fact that git doesn't work on a file level, you don't even have to commit all the changes in a file at once; you can commit arbitrary ranges of lines while leaving other changes in the file outside of the commit. Of course the UI for that is not nearly as simple so most don't do it, but it does rarely have its uses.

          – Alvin Thompson
          Apr 17 at 2:37















        299














        In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with its own version number. CVS was based on RCS (Revision Control System), which tracked individual files in a similar way.



        On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



        When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.






        share|improve this answer




















        • 4





          You could add that this is why in CVS and Subversion, you can use tags like $Id$ in a file. The same does not work in git, because the design is different.

          – gerrit
          Apr 10 at 8:01






        • 54





          And content is not bound to a file as you would expect. Try moving 80% of the code of one file to another. Git automatically detects a file move + 20% change, even when you just moved code around in existing files.

          – allo
          Apr 10 at 12:55






        • 13





          @allo As a side-effect of that, git can do one thing the others can't: when two files are merged and you use "git blame -C", git can look down both histories. In file-based tracking, you have to pick which of the original files is the real original, and the other lines all appear brand-new.

          – Izkata
          Apr 10 at 21:05






        • 1





          @allo, Izkata - And it's the querying entity that works all this out by analysing the repo contents at query time (commit histories and differences between referenced trees and blobs), rather than requiring the committing entity and its human user to correctly specify or synthesise this information at commit time - nor the repo tool developer to design & implement this capability and the corresponding metadata schema before the tool is deployed. Torvalds argued that such analysis will only get better over time, and all history of every git repo since day one will benefit.

          – Jeremy
          Apr 11 at 12:47












        • @allo Yep, and to hammer home the fact that git doesn't work on a file level, you don't even have to commit all the changes in a file at once; you can commit arbitrary ranges of lines while leaving other changes in the file outside of the commit. Of course the UI for that is not nearly as simple so most don't do it, but it does rarely have its uses.

          – Alvin Thompson
          Apr 17 at 2:37













        299












        299








        299







        In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with its own version number. CVS was based on RCS (Revision Control System), which tracked individual files in a similar way.



        On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



        When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.






        share|improve this answer















        In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with its own version number. CVS was based on RCS (Revision Control System), which tracked individual files in a similar way.



        On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



        When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 10 at 19:20









        terdon

        2,02542245




        2,02542245










        answered Apr 9 at 23:52









        bk2204bk2204

        3,4961714




        3,4961714







        • 4





          You could add that this is why in CVS and Subversion, you can use tags like $Id$ in a file. The same does not work in git, because the design is different.

          – gerrit
          Apr 10 at 8:01






        • 54





          And content is not bound to a file as you would expect. Try moving 80% of the code of one file to another. Git automatically detects a file move + 20% change, even when you just moved code around in existing files.

          – allo
          Apr 10 at 12:55






        • 13





          @allo As a side-effect of that, git can do one thing the others can't: when two files are merged and you use "git blame -C", git can look down both histories. In file-based tracking, you have to pick which of the original files is the real original, and the other lines all appear brand-new.

          – Izkata
          Apr 10 at 21:05






        • 1





          @allo, Izkata - And it's the querying entity that works all this out by analysing the repo contents at query time (commit histories and differences between referenced trees and blobs), rather than requiring the committing entity and its human user to correctly specify or synthesise this information at commit time - nor the repo tool developer to design & implement this capability and the corresponding metadata schema before the tool is deployed. Torvalds argued that such analysis will only get better over time, and all history of every git repo since day one will benefit.

          – Jeremy
          Apr 11 at 12:47












        • @allo Yep, and to hammer home the fact that git doesn't work on a file level, you don't even have to commit all the changes in a file at once; you can commit arbitrary ranges of lines while leaving other changes in the file outside of the commit. Of course the UI for that is not nearly as simple so most don't do it, but it does rarely have its uses.

          – Alvin Thompson
          Apr 17 at 2:37












        • 4





          You could add that this is why in CVS and Subversion, you can use tags like $Id$ in a file. The same does not work in git, because the design is different.

          – gerrit
          Apr 10 at 8:01






        • 54





          And content is not bound to a file as you would expect. Try moving 80% of the code of one file to another. Git automatically detects a file move + 20% change, even when you just moved code around in existing files.

          – allo
          Apr 10 at 12:55






        • 13





          @allo As a side-effect of that, git can do one thing the others can't: when two files are merged and you use "git blame -C", git can look down both histories. In file-based tracking, you have to pick which of the original files is the real original, and the other lines all appear brand-new.

          – Izkata
          Apr 10 at 21:05






        • 1





          @allo, Izkata - And it's the querying entity that works all this out by analysing the repo contents at query time (commit histories and differences between referenced trees and blobs), rather than requiring the committing entity and its human user to correctly specify or synthesise this information at commit time - nor the repo tool developer to design & implement this capability and the corresponding metadata schema before the tool is deployed. Torvalds argued that such analysis will only get better over time, and all history of every git repo since day one will benefit.

          – Jeremy
          Apr 11 at 12:47












        • @allo Yep, and to hammer home the fact that git doesn't work on a file level, you don't even have to commit all the changes in a file at once; you can commit arbitrary ranges of lines while leaving other changes in the file outside of the commit. Of course the UI for that is not nearly as simple so most don't do it, but it does rarely have its uses.

          – Alvin Thompson
          Apr 17 at 2:37







        4




        4





        You could add that this is why in CVS and Subversion, you can use tags like $Id$ in a file. The same does not work in git, because the design is different.

        – gerrit
        Apr 10 at 8:01





        You could add that this is why in CVS and Subversion, you can use tags like $Id$ in a file. The same does not work in git, because the design is different.

        – gerrit
        Apr 10 at 8:01




        54




        54





        And content is not bound to a file as you would expect. Try moving 80% of the code of one file to another. Git automatically detects a file move + 20% change, even when you just moved code around in existing files.

        – allo
        Apr 10 at 12:55





        And content is not bound to a file as you would expect. Try moving 80% of the code of one file to another. Git automatically detects a file move + 20% change, even when you just moved code around in existing files.

        – allo
        Apr 10 at 12:55




        13




        13





        @allo As a side-effect of that, git can do one thing the others can't: when two files are merged and you use "git blame -C", git can look down both histories. In file-based tracking, you have to pick which of the original files is the real original, and the other lines all appear brand-new.

        – Izkata
        Apr 10 at 21:05





        @allo As a side-effect of that, git can do one thing the others can't: when two files are merged and you use "git blame -C", git can look down both histories. In file-based tracking, you have to pick which of the original files is the real original, and the other lines all appear brand-new.

        – Izkata
        Apr 10 at 21:05




        1




        1





        @allo, Izkata - And it's the querying entity that works all this out by analysing the repo contents at query time (commit histories and differences between referenced trees and blobs), rather than requiring the committing entity and its human user to correctly specify or synthesise this information at commit time - nor the repo tool developer to design & implement this capability and the corresponding metadata schema before the tool is deployed. Torvalds argued that such analysis will only get better over time, and all history of every git repo since day one will benefit.

        – Jeremy
        Apr 11 at 12:47






        @allo, Izkata - And it's the querying entity that works all this out by analysing the repo contents at query time (commit histories and differences between referenced trees and blobs), rather than requiring the committing entity and its human user to correctly specify or synthesise this information at commit time - nor the repo tool developer to design & implement this capability and the corresponding metadata schema before the tool is deployed. Torvalds argued that such analysis will only get better over time, and all history of every git repo since day one will benefit.

        – Jeremy
        Apr 11 at 12:47














        @allo Yep, and to hammer home the fact that git doesn't work on a file level, you don't even have to commit all the changes in a file at once; you can commit arbitrary ranges of lines while leaving other changes in the file outside of the commit. Of course the UI for that is not nearly as simple so most don't do it, but it does rarely have its uses.

        – Alvin Thompson
        Apr 17 at 2:37





        @allo Yep, and to hammer home the fact that git doesn't work on a file level, you don't even have to commit all the changes in a file at once; you can commit arbitrary ranges of lines while leaving other changes in the file outside of the commit. Of course the UI for that is not nearly as simple so most don't do it, but it does rarely have its uses.

        – Alvin Thompson
        Apr 17 at 2:37













        101














        I agree with brian m. carlson's answer: Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



        In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



        Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



        Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



        When you ask Git to show you a file's history using:



        git log [--follow] [starting-point] [--] path/to/file


        what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



        • the commit is a non-merge commit, and

        • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

        (but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!






        share|improve this answer

























        • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

          – Wes Toleman
          Apr 10 at 3:42











        • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

          – torek
          Apr 10 at 5:32











        • @torek I have a doubt regarding your description about Git answering a file history request but I think it deserves its own proper question: stackoverflow.com/questions/55616349/…

          – Simón Ramírez Amaya
          Apr 10 at 15:34















        101














        I agree with brian m. carlson's answer: Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



        In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



        Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



        Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



        When you ask Git to show you a file's history using:



        git log [--follow] [starting-point] [--] path/to/file


        what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



        • the commit is a non-merge commit, and

        • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

        (but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!






        share|improve this answer

























        • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

          – Wes Toleman
          Apr 10 at 3:42











        • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

          – torek
          Apr 10 at 5:32











        • @torek I have a doubt regarding your description about Git answering a file history request but I think it deserves its own proper question: stackoverflow.com/questions/55616349/…

          – Simón Ramírez Amaya
          Apr 10 at 15:34













        101












        101








        101







        I agree with brian m. carlson's answer: Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



        In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



        Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



        Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



        When you ask Git to show you a file's history using:



        git log [--follow] [starting-point] [--] path/to/file


        what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



        • the commit is a non-merge commit, and

        • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

        (but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!






        share|improve this answer















        I agree with brian m. carlson's answer: Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



        In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



        Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



        Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



        When you ask Git to show you a file's history using:



        git log [--follow] [starting-point] [--] path/to/file


        what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



        • the commit is a non-merge commit, and

        • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

        (but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 10 at 15:29









        Tim Castelijns

        32.2k1292113




        32.2k1292113










        answered Apr 10 at 0:37









        torektorek

        203k20254340




        203k20254340












        • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

          – Wes Toleman
          Apr 10 at 3:42











        • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

          – torek
          Apr 10 at 5:32











        • @torek I have a doubt regarding your description about Git answering a file history request but I think it deserves its own proper question: stackoverflow.com/questions/55616349/…

          – Simón Ramírez Amaya
          Apr 10 at 15:34

















        • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

          – Wes Toleman
          Apr 10 at 3:42











        • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

          – torek
          Apr 10 at 5:32











        • @torek I have a doubt regarding your description about Git answering a file history request but I think it deserves its own proper question: stackoverflow.com/questions/55616349/…

          – Simón Ramírez Amaya
          Apr 10 at 15:34
















        Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

        – Wes Toleman
        Apr 10 at 3:42





        Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

        – Wes Toleman
        Apr 10 at 3:42













        @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

        – torek
        Apr 10 at 5:32





        @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

        – torek
        Apr 10 at 5:32













        @torek I have a doubt regarding your description about Git answering a file history request but I think it deserves its own proper question: stackoverflow.com/questions/55616349/…

        – Simón Ramírez Amaya
        Apr 10 at 15:34





        @torek I have a doubt regarding your description about Git answering a file history request but I think it deserves its own proper question: stackoverflow.com/questions/55616349/…

        – Simón Ramírez Amaya
        Apr 10 at 15:34











        14














        The confusing bit is here:




        Git never ever sees those as individual files. Git thinks everything as the full content.




        Git often uses 160 bit hashes in place of objects in its own repo. A tree of files is basically a list of names and hashes associated with the content of each (plus some metadata).



        But the 160 bit hash uniquely identifies the content (within the universe of the git database). So a tree with hashes as content includes the content in its state.



        If you change the state of the content of a file, its hash changes. But if its hash changes, the hash associated with the file name's content also changes. Which in turn changes the hash of the "directory tree".



        When a git database stores a directory tree, that directory tree implies and includes all of the content of all of the subdirectories and all of the files in it.



        It is organized in a tree structure with (immutable, reusable) pointers to blobs or other trees, but logically it is a single snapshot of the entire content of the entire tree. The representation in the git database isn't the flat data contents, but logically it is all of its data and nothing else.



        If you serialized the tree to a filesystem, deleted all .git folders, and told git to add the tree back into its database, you'd end up with adding nothing to the database -- the element would already be there.



        It may help to think of git's hashes as a reference counted pointer to immutable data.



        If you built an application around that, a document is a bunch of pages, which have layers, which have groups, which have objects.



        When you want to change an object, you have to create a completely new group for it. If you want to change a group, you have to create a new layer, which needs a new page, which needs a new document.



        Every time you change a single object, it spawns a new document. The old document continues to exist. The new and old document share most of their content -- they have the same pages (except 1). That one page has the same layers (except 1). That layer has the same groups (except 1). That group has the same objects (except 1).



        And by same, I mean logically a copy, but implementation-wise it is just another reference counted pointer to the same immutable object.



        A git repo is a lot like that.



        This means that a given git changeset contains its commit message (as a hash code), it contains its work tree, and it contains its parent changes.



        Those parent changes contain their parent changes, all the way back.



        The part of the git repo that contains history is that chain of changes. That chain of changes it at a level above the "directory" tree -- from a "directory" tree, you cannot uniquely get to a change set and the chain of changes.



        To find out what happens to a file, you start with that file in a changeset. That changeset has a history. Often in that history, the same named file exists, sometimes with the same content. If the content is the same, there was no change to the file. If it is different, there is a change, and work needs to be done to work out exactly what.



        Sometimes the file is gone; but, the "directory" tree might have another file with the same content (same hash code), so we can track it that way (note; this is why you want a commit-to-move a file separate from a commit-to-edit). Or the same file name, and after checking the file is similar enough.



        So git can patchwork together a "file history".



        But this file history comes from efficient parsing of the "entire changeset", not from a link from one version of the file to another.






        share|improve this answer



























          14














          The confusing bit is here:




          Git never ever sees those as individual files. Git thinks everything as the full content.




          Git often uses 160 bit hashes in place of objects in its own repo. A tree of files is basically a list of names and hashes associated with the content of each (plus some metadata).



          But the 160 bit hash uniquely identifies the content (within the universe of the git database). So a tree with hashes as content includes the content in its state.



          If you change the state of the content of a file, its hash changes. But if its hash changes, the hash associated with the file name's content also changes. Which in turn changes the hash of the "directory tree".



          When a git database stores a directory tree, that directory tree implies and includes all of the content of all of the subdirectories and all of the files in it.



          It is organized in a tree structure with (immutable, reusable) pointers to blobs or other trees, but logically it is a single snapshot of the entire content of the entire tree. The representation in the git database isn't the flat data contents, but logically it is all of its data and nothing else.



          If you serialized the tree to a filesystem, deleted all .git folders, and told git to add the tree back into its database, you'd end up with adding nothing to the database -- the element would already be there.



          It may help to think of git's hashes as a reference counted pointer to immutable data.



          If you built an application around that, a document is a bunch of pages, which have layers, which have groups, which have objects.



          When you want to change an object, you have to create a completely new group for it. If you want to change a group, you have to create a new layer, which needs a new page, which needs a new document.



          Every time you change a single object, it spawns a new document. The old document continues to exist. The new and old document share most of their content -- they have the same pages (except 1). That one page has the same layers (except 1). That layer has the same groups (except 1). That group has the same objects (except 1).



          And by same, I mean logically a copy, but implementation-wise it is just another reference counted pointer to the same immutable object.



          A git repo is a lot like that.



          This means that a given git changeset contains its commit message (as a hash code), it contains its work tree, and it contains its parent changes.



          Those parent changes contain their parent changes, all the way back.



          The part of the git repo that contains history is that chain of changes. That chain of changes it at a level above the "directory" tree -- from a "directory" tree, you cannot uniquely get to a change set and the chain of changes.



          To find out what happens to a file, you start with that file in a changeset. That changeset has a history. Often in that history, the same named file exists, sometimes with the same content. If the content is the same, there was no change to the file. If it is different, there is a change, and work needs to be done to work out exactly what.



          Sometimes the file is gone; but, the "directory" tree might have another file with the same content (same hash code), so we can track it that way (note; this is why you want a commit-to-move a file separate from a commit-to-edit). Or the same file name, and after checking the file is similar enough.



          So git can patchwork together a "file history".



          But this file history comes from efficient parsing of the "entire changeset", not from a link from one version of the file to another.






          share|improve this answer

























            14












            14








            14







            The confusing bit is here:




            Git never ever sees those as individual files. Git thinks everything as the full content.




            Git often uses 160 bit hashes in place of objects in its own repo. A tree of files is basically a list of names and hashes associated with the content of each (plus some metadata).



            But the 160 bit hash uniquely identifies the content (within the universe of the git database). So a tree with hashes as content includes the content in its state.



            If you change the state of the content of a file, its hash changes. But if its hash changes, the hash associated with the file name's content also changes. Which in turn changes the hash of the "directory tree".



            When a git database stores a directory tree, that directory tree implies and includes all of the content of all of the subdirectories and all of the files in it.



            It is organized in a tree structure with (immutable, reusable) pointers to blobs or other trees, but logically it is a single snapshot of the entire content of the entire tree. The representation in the git database isn't the flat data contents, but logically it is all of its data and nothing else.



            If you serialized the tree to a filesystem, deleted all .git folders, and told git to add the tree back into its database, you'd end up with adding nothing to the database -- the element would already be there.



            It may help to think of git's hashes as a reference counted pointer to immutable data.



            If you built an application around that, a document is a bunch of pages, which have layers, which have groups, which have objects.



            When you want to change an object, you have to create a completely new group for it. If you want to change a group, you have to create a new layer, which needs a new page, which needs a new document.



            Every time you change a single object, it spawns a new document. The old document continues to exist. The new and old document share most of their content -- they have the same pages (except 1). That one page has the same layers (except 1). That layer has the same groups (except 1). That group has the same objects (except 1).



            And by same, I mean logically a copy, but implementation-wise it is just another reference counted pointer to the same immutable object.



            A git repo is a lot like that.



            This means that a given git changeset contains its commit message (as a hash code), it contains its work tree, and it contains its parent changes.



            Those parent changes contain their parent changes, all the way back.



            The part of the git repo that contains history is that chain of changes. That chain of changes it at a level above the "directory" tree -- from a "directory" tree, you cannot uniquely get to a change set and the chain of changes.



            To find out what happens to a file, you start with that file in a changeset. That changeset has a history. Often in that history, the same named file exists, sometimes with the same content. If the content is the same, there was no change to the file. If it is different, there is a change, and work needs to be done to work out exactly what.



            Sometimes the file is gone; but, the "directory" tree might have another file with the same content (same hash code), so we can track it that way (note; this is why you want a commit-to-move a file separate from a commit-to-edit). Or the same file name, and after checking the file is similar enough.



            So git can patchwork together a "file history".



            But this file history comes from efficient parsing of the "entire changeset", not from a link from one version of the file to another.






            share|improve this answer













            The confusing bit is here:




            Git never ever sees those as individual files. Git thinks everything as the full content.




            Git often uses 160 bit hashes in place of objects in its own repo. A tree of files is basically a list of names and hashes associated with the content of each (plus some metadata).



            But the 160 bit hash uniquely identifies the content (within the universe of the git database). So a tree with hashes as content includes the content in its state.



            If you change the state of the content of a file, its hash changes. But if its hash changes, the hash associated with the file name's content also changes. Which in turn changes the hash of the "directory tree".



            When a git database stores a directory tree, that directory tree implies and includes all of the content of all of the subdirectories and all of the files in it.



            It is organized in a tree structure with (immutable, reusable) pointers to blobs or other trees, but logically it is a single snapshot of the entire content of the entire tree. The representation in the git database isn't the flat data contents, but logically it is all of its data and nothing else.



            If you serialized the tree to a filesystem, deleted all .git folders, and told git to add the tree back into its database, you'd end up with adding nothing to the database -- the element would already be there.



            It may help to think of git's hashes as a reference counted pointer to immutable data.



            If you built an application around that, a document is a bunch of pages, which have layers, which have groups, which have objects.



            When you want to change an object, you have to create a completely new group for it. If you want to change a group, you have to create a new layer, which needs a new page, which needs a new document.



            Every time you change a single object, it spawns a new document. The old document continues to exist. The new and old document share most of their content -- they have the same pages (except 1). That one page has the same layers (except 1). That layer has the same groups (except 1). That group has the same objects (except 1).



            And by same, I mean logically a copy, but implementation-wise it is just another reference counted pointer to the same immutable object.



            A git repo is a lot like that.



            This means that a given git changeset contains its commit message (as a hash code), it contains its work tree, and it contains its parent changes.



            Those parent changes contain their parent changes, all the way back.



            The part of the git repo that contains history is that chain of changes. That chain of changes it at a level above the "directory" tree -- from a "directory" tree, you cannot uniquely get to a change set and the chain of changes.



            To find out what happens to a file, you start with that file in a changeset. That changeset has a history. Often in that history, the same named file exists, sometimes with the same content. If the content is the same, there was no change to the file. If it is different, there is a change, and work needs to be done to work out exactly what.



            Sometimes the file is gone; but, the "directory" tree might have another file with the same content (same hash code), so we can track it that way (note; this is why you want a commit-to-move a file separate from a commit-to-edit). Or the same file name, and after checking the file is similar enough.



            So git can patchwork together a "file history".



            But this file history comes from efficient parsing of the "entire changeset", not from a link from one version of the file to another.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Apr 10 at 19:34









            Yakk - Adam NevraumontYakk - Adam Nevraumont

            190k21200386




            190k21200386





















                12














                "git does not track files" basically means that git's commits consist of a file tree snapshot connecting a path in the tree to a "blob" and a commit graph tracking the history of commits. Everything else is reconstructed on-the-fly by commands like "git log" and "git blame". This reconstruction can be told via various options how hard it should look for file-based changes. The default heuristics can determine when a blob changes place in the file tree without change, or when a file is associated with a different blob than before. The compression mechanisms Git uses don't care a whole lot about blob/file boundaries. If the content is somewhere already, this will keep the repository growth small without associating the various blobs.



                Now that is the repository. Git also has a working tree, and in this working tree there are tracked and untracked files. Only the tracked files are recorded in the index (staging area? cache?) and only what is tracked there makes it into the repository.



                The index is file-oriented and there are some file-oriented commands for manipulating it. But what ends up in the repository is just commits in the form of file tree snapshots and the associated blob data and the commit's ancestors.



                Since Git does not track file histories and renames and its efficiency does not depend on them, sometimes you have to try a few times with different options until Git produces the history/diffs/blames you are interested in for non-trivial histories.



                That's different with systems like Subversion which record rather than reconstruct histories. If it's not on record, you don't get to hear about it.



                I actually built a differential installer at one time that just compared release trees by checking them into Git and then producing a script duplicating their effect. Since sometimes whole trees were moved, this produced much smaller differential installers than overwriting/deleting everything would have produced.






                share|improve this answer



























                  12














                  "git does not track files" basically means that git's commits consist of a file tree snapshot connecting a path in the tree to a "blob" and a commit graph tracking the history of commits. Everything else is reconstructed on-the-fly by commands like "git log" and "git blame". This reconstruction can be told via various options how hard it should look for file-based changes. The default heuristics can determine when a blob changes place in the file tree without change, or when a file is associated with a different blob than before. The compression mechanisms Git uses don't care a whole lot about blob/file boundaries. If the content is somewhere already, this will keep the repository growth small without associating the various blobs.



                  Now that is the repository. Git also has a working tree, and in this working tree there are tracked and untracked files. Only the tracked files are recorded in the index (staging area? cache?) and only what is tracked there makes it into the repository.



                  The index is file-oriented and there are some file-oriented commands for manipulating it. But what ends up in the repository is just commits in the form of file tree snapshots and the associated blob data and the commit's ancestors.



                  Since Git does not track file histories and renames and its efficiency does not depend on them, sometimes you have to try a few times with different options until Git produces the history/diffs/blames you are interested in for non-trivial histories.



                  That's different with systems like Subversion which record rather than reconstruct histories. If it's not on record, you don't get to hear about it.



                  I actually built a differential installer at one time that just compared release trees by checking them into Git and then producing a script duplicating their effect. Since sometimes whole trees were moved, this produced much smaller differential installers than overwriting/deleting everything would have produced.






                  share|improve this answer

























                    12












                    12








                    12







                    "git does not track files" basically means that git's commits consist of a file tree snapshot connecting a path in the tree to a "blob" and a commit graph tracking the history of commits. Everything else is reconstructed on-the-fly by commands like "git log" and "git blame". This reconstruction can be told via various options how hard it should look for file-based changes. The default heuristics can determine when a blob changes place in the file tree without change, or when a file is associated with a different blob than before. The compression mechanisms Git uses don't care a whole lot about blob/file boundaries. If the content is somewhere already, this will keep the repository growth small without associating the various blobs.



                    Now that is the repository. Git also has a working tree, and in this working tree there are tracked and untracked files. Only the tracked files are recorded in the index (staging area? cache?) and only what is tracked there makes it into the repository.



                    The index is file-oriented and there are some file-oriented commands for manipulating it. But what ends up in the repository is just commits in the form of file tree snapshots and the associated blob data and the commit's ancestors.



                    Since Git does not track file histories and renames and its efficiency does not depend on them, sometimes you have to try a few times with different options until Git produces the history/diffs/blames you are interested in for non-trivial histories.



                    That's different with systems like Subversion which record rather than reconstruct histories. If it's not on record, you don't get to hear about it.



                    I actually built a differential installer at one time that just compared release trees by checking them into Git and then producing a script duplicating their effect. Since sometimes whole trees were moved, this produced much smaller differential installers than overwriting/deleting everything would have produced.






                    share|improve this answer













                    "git does not track files" basically means that git's commits consist of a file tree snapshot connecting a path in the tree to a "blob" and a commit graph tracking the history of commits. Everything else is reconstructed on-the-fly by commands like "git log" and "git blame". This reconstruction can be told via various options how hard it should look for file-based changes. The default heuristics can determine when a blob changes place in the file tree without change, or when a file is associated with a different blob than before. The compression mechanisms Git uses don't care a whole lot about blob/file boundaries. If the content is somewhere already, this will keep the repository growth small without associating the various blobs.



                    Now that is the repository. Git also has a working tree, and in this working tree there are tracked and untracked files. Only the tracked files are recorded in the index (staging area? cache?) and only what is tracked there makes it into the repository.



                    The index is file-oriented and there are some file-oriented commands for manipulating it. But what ends up in the repository is just commits in the form of file tree snapshots and the associated blob data and the commit's ancestors.



                    Since Git does not track file histories and renames and its efficiency does not depend on them, sometimes you have to try a few times with different options until Git produces the history/diffs/blames you are interested in for non-trivial histories.



                    That's different with systems like Subversion which record rather than reconstruct histories. If it's not on record, you don't get to hear about it.



                    I actually built a differential installer at one time that just compared release trees by checking them into Git and then producing a script duplicating their effect. Since sometimes whole trees were moved, this produced much smaller differential installers than overwriting/deleting everything would have produced.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Apr 10 at 16:09







                    user11341543




























                        7














                        Git doesn't track a file directly, but tracks snapshots of the repository, and these snapshots happen to consist of files.



                        Here's a way to look at it.



                        In other version control systems (SVN, Rational ClearCase), you can right click on a file and get its change history.



                        In Git, there is no direct command that does this. See this question. You'll be surprised at how many different answers there are. There is no one simple answer because Git doesn't simply track a file, not in the way that SVN or ClearCase does it.






                        share|improve this answer


















                        • 5





                          I think I get what you're trying to say, but "In Git, there is no direct command that does this" is directly contradicted by the answers to the question you've linked to. While it's true that versioning happens at the level of the whole repository, there are typically loads of ways to achieve anything in Git, so having multiple commands to show a file's history isn't evidence of much.

                          – Joe Lee-Moyet
                          Apr 10 at 9:55












                        • I skimmed the first few answers of the question you linked and all of them use git log or some program built on top of that (or some alias that does the same thing). But even if there were lots of different ways, as Joe says that's also true for showing branch history. (also git log -p <file> is built in and does exactly that)

                          – Voo
                          Apr 10 at 11:21












                        • Are you sure that SVN internally stores changes per file? I haven't used it in some time already, but I vaguely remember having files named like version ids, rather than reflection of project file structure.

                          – Artur Biesiadowski
                          Apr 11 at 9:36















                        7














                        Git doesn't track a file directly, but tracks snapshots of the repository, and these snapshots happen to consist of files.



                        Here's a way to look at it.



                        In other version control systems (SVN, Rational ClearCase), you can right click on a file and get its change history.



                        In Git, there is no direct command that does this. See this question. You'll be surprised at how many different answers there are. There is no one simple answer because Git doesn't simply track a file, not in the way that SVN or ClearCase does it.






                        share|improve this answer


















                        • 5





                          I think I get what you're trying to say, but "In Git, there is no direct command that does this" is directly contradicted by the answers to the question you've linked to. While it's true that versioning happens at the level of the whole repository, there are typically loads of ways to achieve anything in Git, so having multiple commands to show a file's history isn't evidence of much.

                          – Joe Lee-Moyet
                          Apr 10 at 9:55












                        • I skimmed the first few answers of the question you linked and all of them use git log or some program built on top of that (or some alias that does the same thing). But even if there were lots of different ways, as Joe says that's also true for showing branch history. (also git log -p <file> is built in and does exactly that)

                          – Voo
                          Apr 10 at 11:21












                        • Are you sure that SVN internally stores changes per file? I haven't used it in some time already, but I vaguely remember having files named like version ids, rather than reflection of project file structure.

                          – Artur Biesiadowski
                          Apr 11 at 9:36













                        7












                        7








                        7







                        Git doesn't track a file directly, but tracks snapshots of the repository, and these snapshots happen to consist of files.



                        Here's a way to look at it.



                        In other version control systems (SVN, Rational ClearCase), you can right click on a file and get its change history.



                        In Git, there is no direct command that does this. See this question. You'll be surprised at how many different answers there are. There is no one simple answer because Git doesn't simply track a file, not in the way that SVN or ClearCase does it.






                        share|improve this answer













                        Git doesn't track a file directly, but tracks snapshots of the repository, and these snapshots happen to consist of files.



                        Here's a way to look at it.



                        In other version control systems (SVN, Rational ClearCase), you can right click on a file and get its change history.



                        In Git, there is no direct command that does this. See this question. You'll be surprised at how many different answers there are. There is no one simple answer because Git doesn't simply track a file, not in the way that SVN or ClearCase does it.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Apr 10 at 8:19









                        Double Vision Stout Fat HeavyDouble Vision Stout Fat Heavy

                        952




                        952







                        • 5





                          I think I get what you're trying to say, but "In Git, there is no direct command that does this" is directly contradicted by the answers to the question you've linked to. While it's true that versioning happens at the level of the whole repository, there are typically loads of ways to achieve anything in Git, so having multiple commands to show a file's history isn't evidence of much.

                          – Joe Lee-Moyet
                          Apr 10 at 9:55












                        • I skimmed the first few answers of the question you linked and all of them use git log or some program built on top of that (or some alias that does the same thing). But even if there were lots of different ways, as Joe says that's also true for showing branch history. (also git log -p <file> is built in and does exactly that)

                          – Voo
                          Apr 10 at 11:21












                        • Are you sure that SVN internally stores changes per file? I haven't used it in some time already, but I vaguely remember having files named like version ids, rather than reflection of project file structure.

                          – Artur Biesiadowski
                          Apr 11 at 9:36












                        • 5





                          I think I get what you're trying to say, but "In Git, there is no direct command that does this" is directly contradicted by the answers to the question you've linked to. While it's true that versioning happens at the level of the whole repository, there are typically loads of ways to achieve anything in Git, so having multiple commands to show a file's history isn't evidence of much.

                          – Joe Lee-Moyet
                          Apr 10 at 9:55












                        • I skimmed the first few answers of the question you linked and all of them use git log or some program built on top of that (or some alias that does the same thing). But even if there were lots of different ways, as Joe says that's also true for showing branch history. (also git log -p <file> is built in and does exactly that)

                          – Voo
                          Apr 10 at 11:21












                        • Are you sure that SVN internally stores changes per file? I haven't used it in some time already, but I vaguely remember having files named like version ids, rather than reflection of project file structure.

                          – Artur Biesiadowski
                          Apr 11 at 9:36







                        5




                        5





                        I think I get what you're trying to say, but "In Git, there is no direct command that does this" is directly contradicted by the answers to the question you've linked to. While it's true that versioning happens at the level of the whole repository, there are typically loads of ways to achieve anything in Git, so having multiple commands to show a file's history isn't evidence of much.

                        – Joe Lee-Moyet
                        Apr 10 at 9:55






                        I think I get what you're trying to say, but "In Git, there is no direct command that does this" is directly contradicted by the answers to the question you've linked to. While it's true that versioning happens at the level of the whole repository, there are typically loads of ways to achieve anything in Git, so having multiple commands to show a file's history isn't evidence of much.

                        – Joe Lee-Moyet
                        Apr 10 at 9:55














                        I skimmed the first few answers of the question you linked and all of them use git log or some program built on top of that (or some alias that does the same thing). But even if there were lots of different ways, as Joe says that's also true for showing branch history. (also git log -p <file> is built in and does exactly that)

                        – Voo
                        Apr 10 at 11:21






                        I skimmed the first few answers of the question you linked and all of them use git log or some program built on top of that (or some alias that does the same thing). But even if there were lots of different ways, as Joe says that's also true for showing branch history. (also git log -p <file> is built in and does exactly that)

                        – Voo
                        Apr 10 at 11:21














                        Are you sure that SVN internally stores changes per file? I haven't used it in some time already, but I vaguely remember having files named like version ids, rather than reflection of project file structure.

                        – Artur Biesiadowski
                        Apr 11 at 9:36





                        Are you sure that SVN internally stores changes per file? I haven't used it in some time already, but I vaguely remember having files named like version ids, rather than reflection of project file structure.

                        – Artur Biesiadowski
                        Apr 11 at 9:36











                        3














                        Tracking "content", incidentally, is what led to not track empty directories.

                        That is why, if you git rm the last file of a folder, the folder itself gets deleted.



                        That wasn't always the case, and only Git 1.4 (May 2006) enforced that "tracking content" policy with commit 443f833:




                        git status: skip empty directories, and add -u to show all untracked files



                        By default, we use --others --directory to show uninteresting directories (to get user's attention) without their contents (to unclutter output).

                        Showing empty directories do not make sense, so pass --no-empty-directory when we do so.



                        Giving -u (or --untracked) disables this uncluttering to let the
                        user get all untracked files.




                        That was echoed years later in Jan. 2011 with commit 8fe533, Git v1.7.4:




                        This is in keeping with the general UI philosophy: git tracks content, not empty directories.




                        In the meantime, with Git 1.4.3 (Sept. 2006), Git starts limiting untracked content to non-empty folders, with commit 2074cb0:




                        it should not list the contents of completely untracked directories, but only the name of that directory (plus a trailing '/').




                        Tracking content is what allowed git blame to, very early on (Git 1.4.4, Oct. 2006, commit cee7f24) be more performant:




                        More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit.




                        That (tracking content) is also what put git add in the Git API, with Git 1.5.0 (Dec. 2006, commit 366bfcb)




                        make 'git add' a first class user friendly interface to the index



                        This brings the power of the index up front using a proper mental model without talking about the index at all.

                        See for example how all the technical discussion has been evacuated from the git-add man page.




                        Any content to be committed must be added together.

                        Whether that content comes from new files or modified files doesn't matter.

                        You just need to "add" it, either with git-add, or by providing git-commit with -a (for already known files only of course).





                        That is what made git add --interactive possible, with the same Git 1.5.0 (commit 5cde71d)




                        After making the selection, answer with an empty line to stage the contents of working tree files for selected paths in the index.




                        That is also why, to recursively remove all contents from a directory, you need to pass -r option, not just the directory name as the <path> (still Git 1.5.0, commit 9f95069).



                        Seeing file content instead of file itself is what allows merge scenario like the one described in commit 1de70db (Git v2.18.0-rc0, Apr. 2018)




                        Consider the following merge with a rename/add conflict:



                        • side A: modify foo, add unrelated bar

                        • side B: rename foo->bar (but don't modify the mode or contents)

                        In this case, the three-way merge of original foo, A's foo, and B's bar will result in a desired pathname of bar with the same mode/contents that A had for foo.

                        Thus, A had the right mode and contents for the file, and it had the right pathname present (namely, bar).




                        Commit 37b65ce, Git v2.21.0-rc0, Dec. 2018, recently improved colliding conflict resolutions.

                        And commit bbafc9c firther illustrates the importance of considering file content, by improving the handling for rename/rename(2to1) conflicts:




                        • Instead of storing files at collide_path~HEAD and collide_path~MERGE, the files are two-way merged and recorded at collide_path.

                        • Instead of recording the version of the renamed file that existed on the renamed side in the index (thus ignoring any changes that were made to the file on the side of history without the rename), we do a three-way content merge on the renamed path, then store that at either stage 2 or stage 3.

                        • Note that since the content merge for each rename may have conflicts, and then we have to merge the two renamed files, we can end up with nested conflict markers.






                        share|improve this answer



























                          3














                          Tracking "content", incidentally, is what led to not track empty directories.

                          That is why, if you git rm the last file of a folder, the folder itself gets deleted.



                          That wasn't always the case, and only Git 1.4 (May 2006) enforced that "tracking content" policy with commit 443f833:




                          git status: skip empty directories, and add -u to show all untracked files



                          By default, we use --others --directory to show uninteresting directories (to get user's attention) without their contents (to unclutter output).

                          Showing empty directories do not make sense, so pass --no-empty-directory when we do so.



                          Giving -u (or --untracked) disables this uncluttering to let the
                          user get all untracked files.




                          That was echoed years later in Jan. 2011 with commit 8fe533, Git v1.7.4:




                          This is in keeping with the general UI philosophy: git tracks content, not empty directories.




                          In the meantime, with Git 1.4.3 (Sept. 2006), Git starts limiting untracked content to non-empty folders, with commit 2074cb0:




                          it should not list the contents of completely untracked directories, but only the name of that directory (plus a trailing '/').




                          Tracking content is what allowed git blame to, very early on (Git 1.4.4, Oct. 2006, commit cee7f24) be more performant:




                          More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit.




                          That (tracking content) is also what put git add in the Git API, with Git 1.5.0 (Dec. 2006, commit 366bfcb)




                          make 'git add' a first class user friendly interface to the index



                          This brings the power of the index up front using a proper mental model without talking about the index at all.

                          See for example how all the technical discussion has been evacuated from the git-add man page.




                          Any content to be committed must be added together.

                          Whether that content comes from new files or modified files doesn't matter.

                          You just need to "add" it, either with git-add, or by providing git-commit with -a (for already known files only of course).





                          That is what made git add --interactive possible, with the same Git 1.5.0 (commit 5cde71d)




                          After making the selection, answer with an empty line to stage the contents of working tree files for selected paths in the index.




                          That is also why, to recursively remove all contents from a directory, you need to pass -r option, not just the directory name as the <path> (still Git 1.5.0, commit 9f95069).



                          Seeing file content instead of file itself is what allows merge scenario like the one described in commit 1de70db (Git v2.18.0-rc0, Apr. 2018)




                          Consider the following merge with a rename/add conflict:



                          • side A: modify foo, add unrelated bar

                          • side B: rename foo->bar (but don't modify the mode or contents)

                          In this case, the three-way merge of original foo, A's foo, and B's bar will result in a desired pathname of bar with the same mode/contents that A had for foo.

                          Thus, A had the right mode and contents for the file, and it had the right pathname present (namely, bar).




                          Commit 37b65ce, Git v2.21.0-rc0, Dec. 2018, recently improved colliding conflict resolutions.

                          And commit bbafc9c firther illustrates the importance of considering file content, by improving the handling for rename/rename(2to1) conflicts:




                          • Instead of storing files at collide_path~HEAD and collide_path~MERGE, the files are two-way merged and recorded at collide_path.

                          • Instead of recording the version of the renamed file that existed on the renamed side in the index (thus ignoring any changes that were made to the file on the side of history without the rename), we do a three-way content merge on the renamed path, then store that at either stage 2 or stage 3.

                          • Note that since the content merge for each rename may have conflicts, and then we have to merge the two renamed files, we can end up with nested conflict markers.






                          share|improve this answer

























                            3












                            3








                            3







                            Tracking "content", incidentally, is what led to not track empty directories.

                            That is why, if you git rm the last file of a folder, the folder itself gets deleted.



                            That wasn't always the case, and only Git 1.4 (May 2006) enforced that "tracking content" policy with commit 443f833:




                            git status: skip empty directories, and add -u to show all untracked files



                            By default, we use --others --directory to show uninteresting directories (to get user's attention) without their contents (to unclutter output).

                            Showing empty directories do not make sense, so pass --no-empty-directory when we do so.



                            Giving -u (or --untracked) disables this uncluttering to let the
                            user get all untracked files.




                            That was echoed years later in Jan. 2011 with commit 8fe533, Git v1.7.4:




                            This is in keeping with the general UI philosophy: git tracks content, not empty directories.




                            In the meantime, with Git 1.4.3 (Sept. 2006), Git starts limiting untracked content to non-empty folders, with commit 2074cb0:




                            it should not list the contents of completely untracked directories, but only the name of that directory (plus a trailing '/').




                            Tracking content is what allowed git blame to, very early on (Git 1.4.4, Oct. 2006, commit cee7f24) be more performant:




                            More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit.




                            That (tracking content) is also what put git add in the Git API, with Git 1.5.0 (Dec. 2006, commit 366bfcb)




                            make 'git add' a first class user friendly interface to the index



                            This brings the power of the index up front using a proper mental model without talking about the index at all.

                            See for example how all the technical discussion has been evacuated from the git-add man page.




                            Any content to be committed must be added together.

                            Whether that content comes from new files or modified files doesn't matter.

                            You just need to "add" it, either with git-add, or by providing git-commit with -a (for already known files only of course).





                            That is what made git add --interactive possible, with the same Git 1.5.0 (commit 5cde71d)




                            After making the selection, answer with an empty line to stage the contents of working tree files for selected paths in the index.




                            That is also why, to recursively remove all contents from a directory, you need to pass -r option, not just the directory name as the <path> (still Git 1.5.0, commit 9f95069).



                            Seeing file content instead of file itself is what allows merge scenario like the one described in commit 1de70db (Git v2.18.0-rc0, Apr. 2018)




                            Consider the following merge with a rename/add conflict:



                            • side A: modify foo, add unrelated bar

                            • side B: rename foo->bar (but don't modify the mode or contents)

                            In this case, the three-way merge of original foo, A's foo, and B's bar will result in a desired pathname of bar with the same mode/contents that A had for foo.

                            Thus, A had the right mode and contents for the file, and it had the right pathname present (namely, bar).




                            Commit 37b65ce, Git v2.21.0-rc0, Dec. 2018, recently improved colliding conflict resolutions.

                            And commit bbafc9c firther illustrates the importance of considering file content, by improving the handling for rename/rename(2to1) conflicts:




                            • Instead of storing files at collide_path~HEAD and collide_path~MERGE, the files are two-way merged and recorded at collide_path.

                            • Instead of recording the version of the renamed file that existed on the renamed side in the index (thus ignoring any changes that were made to the file on the side of history without the rename), we do a three-way content merge on the renamed path, then store that at either stage 2 or stage 3.

                            • Note that since the content merge for each rename may have conflicts, and then we have to merge the two renamed files, we can end up with nested conflict markers.






                            share|improve this answer













                            Tracking "content", incidentally, is what led to not track empty directories.

                            That is why, if you git rm the last file of a folder, the folder itself gets deleted.



                            That wasn't always the case, and only Git 1.4 (May 2006) enforced that "tracking content" policy with commit 443f833:




                            git status: skip empty directories, and add -u to show all untracked files



                            By default, we use --others --directory to show uninteresting directories (to get user's attention) without their contents (to unclutter output).

                            Showing empty directories do not make sense, so pass --no-empty-directory when we do so.



                            Giving -u (or --untracked) disables this uncluttering to let the
                            user get all untracked files.




                            That was echoed years later in Jan. 2011 with commit 8fe533, Git v1.7.4:




                            This is in keeping with the general UI philosophy: git tracks content, not empty directories.




                            In the meantime, with Git 1.4.3 (Sept. 2006), Git starts limiting untracked content to non-empty folders, with commit 2074cb0:




                            it should not list the contents of completely untracked directories, but only the name of that directory (plus a trailing '/').




                            Tracking content is what allowed git blame to, very early on (Git 1.4.4, Oct. 2006, commit cee7f24) be more performant:




                            More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit.




                            That (tracking content) is also what put git add in the Git API, with Git 1.5.0 (Dec. 2006, commit 366bfcb)




                            make 'git add' a first class user friendly interface to the index



                            This brings the power of the index up front using a proper mental model without talking about the index at all.

                            See for example how all the technical discussion has been evacuated from the git-add man page.




                            Any content to be committed must be added together.

                            Whether that content comes from new files or modified files doesn't matter.

                            You just need to "add" it, either with git-add, or by providing git-commit with -a (for already known files only of course).





                            That is what made git add --interactive possible, with the same Git 1.5.0 (commit 5cde71d)




                            After making the selection, answer with an empty line to stage the contents of working tree files for selected paths in the index.




                            That is also why, to recursively remove all contents from a directory, you need to pass -r option, not just the directory name as the <path> (still Git 1.5.0, commit 9f95069).



                            Seeing file content instead of file itself is what allows merge scenario like the one described in commit 1de70db (Git v2.18.0-rc0, Apr. 2018)




                            Consider the following merge with a rename/add conflict:



                            • side A: modify foo, add unrelated bar

                            • side B: rename foo->bar (but don't modify the mode or contents)

                            In this case, the three-way merge of original foo, A's foo, and B's bar will result in a desired pathname of bar with the same mode/contents that A had for foo.

                            Thus, A had the right mode and contents for the file, and it had the right pathname present (namely, bar).




                            Commit 37b65ce, Git v2.21.0-rc0, Dec. 2018, recently improved colliding conflict resolutions.

                            And commit bbafc9c firther illustrates the importance of considering file content, by improving the handling for rename/rename(2to1) conflicts:




                            • Instead of storing files at collide_path~HEAD and collide_path~MERGE, the files are two-way merged and recorded at collide_path.

                            • Instead of recording the version of the renamed file that existed on the renamed side in the index (thus ignoring any changes that were made to the file on the side of history without the rename), we do a three-way content merge on the renamed path, then store that at either stage 2 or stage 3.

                            • Note that since the content merge for each rename may have conflicts, and then we have to merge the two renamed files, we can end up with nested conflict markers.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Apr 16 at 16:48









                            VonCVonC

                            859k30227463314




                            859k30227463314



























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55602748%2fwhat-does-linus-torvalds-mean-when-he-says-that-git-never-ever-tracks-a-file%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                                Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                                Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High