The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

App::SnerpVortex - Replay a Subversion dump into Git/Filesystem/etc.

About

Snerp Vortex is an anagram of SVN Exporter. It aims to be a faster, more reliable way to create new repositories from Subversion dumps than using git-svn and/or various abandonment techniques.

Faster? On my canonical example repository (POE), Snerp Vortex converts 2824 Subversion commits to Git in under 300 seconds.

Not fast enough? The conversion happens in about 100 seconds if we remove Git's porcelain from the equation. So there's a lot of room for improvement, perhaps by switching to git-fast-import. I'm looking for someone who wants to help port it over.

More satisfying? Snerp Vortex uses path analysis to detect hints about tags and branches. It then adjusts its assumptions according to actual repository use. Tags that are modified later become branches. Branches that are never touched are demoted to tags.

Snerp Vortex gains some benefits by doing tag and branch analysis before converting the repository:

  • Tag and branch analyses can be examined by a human without converting the repository. The snassign-gui utility graphically browses the repository structure over time.

  • Analysis errors can be fixed and redone quickly without waiting for lengthy repository conversions each time.

  • Tagging and branching are performed as "git tag" and "git branch" at appropriate times. It's faster and smaller than duplicating directory trees and converting them later.

There is rudimentary support for multiple projects per repository, but it needs love.

Toolset

Snerp Vortex is a chain of multiple tools.

snanalyze

snanalyze examines a Subversion dump and produces a SQLite database that describes its structure over time. snanalyze is intended to be run first, as most other utilities require the SQLite database to work.

snassign-auto

snassing-auto attemtps to automaically assign tags and branches based on directory locations and usage patterns. It's generally run after snanalyze and before snassign-gui.

snassign-gui

snassign-gui is a Gtk2 utility to browse a repository analysis. With it, one can page back and forth through significant revisions to see how snassign-auto interpreted structural changes.

Some repositories will be too complex for snassign-auto to be successful. We hope a motivated individual will update snassign-gui to be a tag/branch assignment editor so humans can override the automatic assignment.

snassign-gui is intended to be used to verify that snassign-auto worked correctly, before a possibly lengthy snerp run.

snerp

When everything is ready, snerp is called to export the Subversion dump. It takes as input the Subversion dump, and the index database containing final tag and branch assignments. It produces a new copy of the repository in the desired format.

Getting Subversion Dumps

Snerp Vortex requires a Subversion dump file, which is generally created by running svnadmin dump on a local repository.

There's also a remote svn dump utility that may help, but we haven't tried it: http://rsvndump.sourceforge.net/

Other Included Utilities

Snerp Vortex comes with some utilities and scripts that will eventually be cleaned up and organized. Until then:

mkramdisk_osx

Create a 1 GB RAM disk with a case-sensitive filesystem. Extremely useful for Macintosh machines that use case-insensitive filesystems by default.

snub

Snub the file contents of a dump. Retains the file and directory structure, but the resulting dump and replays are much smaller. Written for Ævar Arnfjörð Bjarmason's six-gigabyte dump, which triggers a hard to reproduce bug.

diff-test

Performs a recursive diff, excluding some things that Subversion may have that another VCS may not. For example, expanded "$Id$" tags. Useful for testing the results of a replay, although it won't test intermediate revisions... only the final ones.

Development Scripts

Other development and/or test scripts are included in the distribution but are neither installed nor documented here. Browse around!

OSX Users

Get yourselves a case-sensitive filesystem. This is easier done than said. Disk Utility can create empty random-access disk images with the filesystems of your choice. They mount in /Volumes and are accessible like any other filesystem.

Even better, build a RAM disk if you have the memory to spare. See the mkramdisk_osx utility in this project.

Improvements?

I've heard that git-fast-import can potentially make Snerp Vortex a lot faster. The program should be flexible enough to support it without much fuss.

I may not get around to it, as I'm rapidly running out of Subversion repositories to convert. If you want or need this, please consider contributing.

Testing

Until there's a proper test framework, here's the plan from a recent test I ran.

Create a dummy repository, check it out and establish a test case within it.

        svnadmin create binary-svn
        svn co file:///home/troc/projects/git/binary-svn binary-co          
        cd binary-co
        cp ~/Downloads/wtf.gif .
        svn add wtf.gif
        svn commit -m 'Commit a binary file.' 

Dump the repository.

        cd ..
        svnadmin dump binary-svn > binary-svn.dump

If it's a really huge repository, then early debugging might go better if the contents of all the files is omitted.

        cat huge.dump | ./snub --file - > smaller.dump

Replay the repository into git.

        cd snerp-vortex

        time ./snerp \
                --replayer=git \
                --authors=/home/troc/projects/authors.txt \
                --into=/Volumes/snerp-vortex-workspace/binary-git \
                --dump=../binary-files.dump \
                --copies=/Volumes/snerp-vortex-workspace/binary-snerp-copies \
                --verbose

Verify that the replayed binary file works.

        open /Volumes/snerp-vortex-workspace/binary-git/wtf.gif

The distribution's t/dumps directory is the repository for test dumps.

Design Notes

There are multiple kinds of branch, some of which don't map to Git's idea of branches. For example, there's the branch that is someone's personal scratch workspace. Then there's the branch intended to be merged back later.

Tags and branches are defined by usage patterns, not by the directories in which they live. Proper branches and tags are created by copying, not by creating directories. The difference is that branches are modified after copying while tags are not. Subversion "tags" are frequently modified, and "branches" are sometimes never touched. Snerp Vortex tries to be smart about this.

Subprojects are not attempted to be spun off into separate repositories. In personal experience, spin-off projects are moved from /trunk into some new directory, possibly also in trunk. The files are then modified there. To preserve full history, I plan to fork the full Git repository and follow Michaelangelo's advice: carve away everything that isn't the project. Better plans are welcome.

Subversion can tag subdirectories within trunk. After all, tags are just directory copies. Git cannot. Subversion tags are translated to Git by tagging HEAD at the relative moment when the Subversion tree has been tagged. Is there a better way to do this?

BUGS

Snerp Vortex is early beta quality. It seems to work in limited tests, but there's no guarantee it will work for you. Fixes are greatly appreciated.

SEE ALSO

SVN::Dump - Subversion dumps are parsed by SVN::Dump.

snanalyze - Analyze a Subversion dump, and produce an index database for other tools to process.

snassign-auto - Automatically assign tags and branches to a snanalyze index.

snassign-gui - Graphical snanalyze index browser. Future plans will allow users to assign branches and tags by hand. Requires Gtk.

snauthors - Extract a basic authors.txt file from a Subversion dump.

snerp - Convert a Subversion repository to a flat filesystem or Git. Uses the snanalyze index, with help from the snassign tools, to intelligently branch and tag as it goes.

AUTHORS AND LICENSE

Snerp Vortex is Copyright 2010 by Rocco Caputo and contributors.

It is released under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 129:

Non-ASCII character seen before =encoding in 'Ævar'. Assuming UTF-8