• [gentoo-user] sync-type: rsync vs git

    From Grant Edwards@21:1/5 to All on Wed Apr 27 16:30:02 2022
    A while back I switched one of my machines sync-type for the gentoo
    repo from rsync to git using https://anongit.gentoo.org/git/repo/sync/gentoo.git
    because that machine is behind a firewall that stopped allowing rsync connections.

    Is there any advantage (either to me or the Gentoo community) to
    continue to use rsync and the rsync pool instead of switching the
    rest of my machines to git?

    I've been very impressed with the reliability and speed of sync
    operations using git they never take more than a few seconds. When
    using rsync, it seems like I regularly used to have to spend time
    trying different mirrors and hard-wiring one in my config file because
    the one I (or the pool) had chosen had fallen back to using a Bell-212
    modem for its internet connection. Sync operations often used to take
    many minutes and would sometimes just hang.

    --
    Grant

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Freeman@21:1/5 to grant.b.edwards@gmail.com on Wed Apr 27 17:20:01 2022
    On Wed, Apr 27, 2022 at 10:22 AM Grant Edwards
    <grant.b.edwards@gmail.com> wrote:

    Is there any advantage (either to me or the Gentoo community) to
    continue to use rsync and the rsync pool instead of switching the
    rest of my machines to git?

    I've been very impressed with the reliability and speed of sync
    operations using git they never take more than a few seconds.

    With git you might need to occasionally wipe your repository to delete
    history if you don't want it to accumulate (I don't think there is a
    way to do that automatically but if you can tell git to drop history
    let me know).

    Of course that history can come in handy if you need to revert something/etc.

    If you sync infrequently - say once a month or less frequently, then
    I'd expect rsync to be faster. This is because git has to fetch every
    single set of changes since the last sync, while rsync just compares
    everything at a file level. Over a long period of time that means
    that if a package was revised 4 times and old versions were pruned 4
    times, then you end up fetching and ignoring 2-3 versions of the
    package that would just never be fetched at all with rsync. That can
    add up if it has been a long time.

    On the other hand, if you sync frequently (especially daily or more
    often), then git is FAR less expensive in both IO and CPU on both your
    side and on the server side. Your git client and the server just
    communicate what revision they're at, the server can see all the
    versions you're missing, and send the history in-between. Then your
    client can see what objects it is missing that it wants and fetch
    them. Since it is all de-duped by its design anything that hasn't
    changed or which the repo has already seen will not need to be
    transferred. With rsync you need to scan the entire filesystem
    metadata at least on both ends to figure out what has changed, and if
    your metadata isn't trustworthy you need to hash all the file contents
    (which isn't done by default). Since git is content-hashed you
    basically get more data integrity than the default level for rsync and
    the only thing that needs to be read is the git metadata, which is
    packed efficiently.

    Bottom line is that I think git just makes more sense these days for
    the typical gentoo user, who is far more likely to be interested in
    things like changelogs and commit histories than users of other
    distros. I'm not saying it is always the best choice for everybody,
    but you should consider it and improve your git-fu if you need to.
    Oh, and if you want the equivalent of an old changelog, just go into a directory and run "git whatchanged ."

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Edwards@21:1/5 to Rich Freeman on Wed Apr 27 18:30:01 2022
    On 2022-04-27, Rich Freeman <rich0@gentoo.org> wrote:
    On Wed, Apr 27, 2022 at 10:22 AM Grant Edwards
    <grant.b.edwards@gmail.com> wrote:

    Is there any advantage (either to me or the Gentoo community) to
    continue to use rsync and the rsync pool instead of switching the
    rest of my machines to git?

    I've been very impressed with the reliability and speed of sync
    operations using git they never take more than a few seconds.

    With git you might need to occasionally wipe your repository to
    delete history if you don't want it to accumulate (I don't think
    there is a way to do that automatically but if you can tell git to
    drop history let me know).

    I don't think I have any history. I use sync-depth=1 and clone-depth=1.

    Both git log and git whatchanged only show one commit.


    Of course that history can come in handy if you need to revert
    something/etc.

    Perhaps I should keep a few levels of history...

    If you sync infrequently - say once a month or less frequently, then
    I'd expect rsync to be faster.

    I generally sync several times a week, and git is often very much
    faster than rsync. Git is always done in a few seconds. The time
    required for rsync varies widely from a handfull of seconds to tens of
    minutes.

    This is because git has to fetch every single set of changes since
    the last sync, while rsync just compares everything at a file level.
    [...]
    That can add up if it has been a long time.

    AFAICT, the emerge repo git "depth" settings of 1 prevent that: the intermediate versions are discarded on the server side as is previous
    local history. The end result is similar to rsync: you fetch only the
    current version of what's changed since the last "sync", and there's
    no local history.

    Bottom line is that I think git just makes more sense these days for
    the typical gentoo user, who is far more likely to be interested in
    things like changelogs and commit histories than users of other
    distros. I'm not saying it is always the best choice for everybody,
    but you should consider it and improve your git-fu if you need to.
    Oh, and if you want the equivalent of an old changelog, just go into a directory and run "git whatchanged ."

    Right now with a depth of 1, git log/whatchanged don't provide any
    information (they think all files were new as of the last "sync").
    What I should figure out is what settings will preserver a few levels
    of changes that have been made to my local repo, without preserving intermediate changes to the master repo that never got used locally.

    IOW, I want all the changes made during a single "sync" to go into my
    local repo as a single commit regardless of how many commits have been
    made to the master repo since my previous "sync". I think git can do
    that -- whether the emerge sync settings in /etc/portage/repos.conf/gentoo.conf allow me to tell emerge to tell git to do that is the question.

    --
    Grant

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wol@21:1/5 to Rich Freeman on Wed Apr 27 22:10:01 2022
    On 27/04/2022 16:18, Rich Freeman wrote:
    On Wed, Apr 27, 2022 at 10:22 AM Grant Edwards
    <grant.b.edwards@gmail.com> wrote:
    Is there any advantage (either to me or the Gentoo community) to
    continue to use rsync and the rsync pool instead of switching the
    rest of my machines to git?

    I've been very impressed with the reliability and speed of sync
    operations using git they never take more than a few seconds.
    With git you might need to occasionally wipe your repository to delete history if you don't want it to accumulate (I don't think there is a
    way to do that automatically but if you can tell git to drop history
    let me know).

    Look into "git pack". It won't get rid of old versions, but I think it compresses all the old stuff. But once the repository has been packed, I
    gather it's normal for the old packed stuff to take up less space than
    the current stuff.

    Cheers,
    Wol

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wol@21:1/5 to Grant Edwards on Wed Apr 27 22:20:01 2022
    On 27/04/2022 17:24, Grant Edwards wrote:
    IOW, I want all the changes made during a single "sync" to go into my
    local repo as a single commit regardless of how many commits have been
    made to the master repo since my previous "sync". I think git can do
    that -- whether the emerge sync settings in /etc/portage/repos.conf/gentoo.conf
    allow me to tell emerge to tell git to do that is the question.

    I don't know as that will do you any good.

    Just use git tags, every time you do a "sync; emerge", just tag the
    repository with the date. So when you list the tags you'll see all the
    dates you did an update, and by branching to that tag, you'll be able to
    go back to that date.

    I just use "lvm snapshot" :-)

    Cheers,
    Wol

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)