Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch

Posted by hansivers on Fri 21 Apr 2006 at 11:10

There are a lot of Linux filesystems comparisons available but most of them are anecdotal, based on artificial tasks or completed under older kernels. This benchmark essay is based on 11 real-world tasks appropriate for a file server with older generation hardware (Pentium II/III, EIDE hard-drive).

Since its initial publication, this article has generated
a lot of questions, comments and suggestions to improve it.
Consequently, I'm currently working hard on a new batch of tests
to answer as many questions as possible (within the original scope
of the article).

Results will be available in about two weeks (May 8, 2006)

Many thanks for your interest and keep in touch with
Debian-Administration.org!

Hans

Why another benchmark test?

I found two quantitative and reproductible benchmark testing studies using the 2.6.x kernel (see References). Benoit (2003) implemented 12 tests using large files (1+ GB) on a Pentium II 500 server with 512MB RAM. This test was quite informative but results are beginning to aged (kernel 2.6.0) and mostly applied to settings which manipulate exclusively large files (e.g., multimedia, scientific, databases).

Piszcz (2006) implemented 21 tasks simulating a variety of file operations on a PIII-500 with 768MB RAM and a 400GB EIDE-133 hard disk. To date, this testing appears to be the most comprehensive work on the 2.6 kernel. However, since many tasks were "artificial" (e.g., copying and removing 10 000 empty directories, touching 10 000 files, splitting files recursively), it may be difficult to transfer some conclusions to real-world settings.

Thus, the objective of the present benchmark testing is to complete some Piszcz (2006) conclusions, by focusing exclusively on real-world operations found in small-business file servers (see Tasks description).

Test settings

    Hardware
  • Processor : Intel Celeron 533
  • RAM : 512MB RAM PC100
  • Motherboard : ASUS P2B
  • Hard drive : WD Caviar SE 160GB (EIDE 100, 7200 RPM, 8MB Cache)
  • Controller : ATA/133 PCI (Silicon Image)
    OS
  • Debian Etch (kernel 2.6.15), distribution upgraded on April 18, 2006
  • All optional daemons killed (cron,ssh,saMBa,etc.)
    Filesystems
  • Ext3 (e2fsprogs 1.38)
  • ReiserFS (reiserfsprogs 1.3.6.19)
  • JFS (jfsutils 1.1.8)
  • XFS (xfsprogs 2.7.14)

Description of selected tasks

    Operations on a large file (ISO image, 700MB)
  • Copy ISO from a second disk to the test disk
  • Recopy ISO in another location on the test disk
  • Remove both copies of ISO
    Operations on a file tree (7500 files, 900 directories, 1.9GB)
  • Copy file tree from a second disk to the test disk
  • Recopy file tree in another location on the test disk
  • Remove both copies of file tree
    Operations into the file tree
  • List recursively all contents of the file tree and save it on the test disk
  • Find files matching a specific wildcard into the file tree
    Operations on the file system
  • Creation of the filesystem (mkfs) (all FS were created with default values)
  • Mount filesystem
  • Umount filesystem

The sequence of 11 tasks (from creation of FS to umounting FS) was run as a Bash script which was completed three times (the average is reported). Each sequence takes about 7 min. Time to complete task (in secs), percentage of CPU dedicated to task and number of major/minor page faults during task were computed by the GNU time utility (version 1.7).

RESULTS

Partition capacity

Initial (after filesystem creation) and residual (after removal of all files) partition capacity was computed as the ratio of number of available blocks by number of blocks on the partition. Ext3 has the worst inital capacity (92.77%), while others FS preserve almost full partition capacity (ReiserFS = 99.83%, JFS = 99.82%, XFS = 99.95%). Interestingly, the residual capacity of Ext3 and ReiserFS was identical to the initial, while JFS and XFS lost about 0.02% of their partition capacity, suggesting that these FS can dynamically grow but do not completely return to their inital state (and size) after file removal.
Conclusion : To use the maximum of your partition capacity, choose ReiserFS, JFS or XFS.

File system creation, mounting and unmounting

The creation of FS on the 20GB test partition took 14.7 secs for Ext3, compared to 2 secs or less for other FS (ReiserFS = 2.2, JFS = 1.3, XFS = 0.7). However, the ReiserFS took 5 to 15 times longer to mount the FS (2.3 secs) when compared to other FS (Ext3 = 0.2, JFS = 0.2, XFS = 0.5), and also 2 times longer to umount the FS (0.4 sec). All FS took comparable amounts of CPU to create FS (between 59% - ReiserFS and 74% - JFS) and to mount FS (between 6 and 9%). However, Ex3 and XFS took about 2 times more CPU to umount (37% and 45%), compared to ReiserFS and JFS (14% and 27%).
Conclusion : For quick FS creation and mounting/unmounting, choose JFS or XFS.

Operations on a large file (ISO image, 700MB)

The initial copy of the large file took longer on Ext3 (38.2 secs) and ReiserFS (41.8) when compared to JFS and XFS (35.1 and 34.8). The recopy on the same disk advantaged the XFS (33.1 secs), when compared to other FS (Ext3 = 37.3, JFS = 39.4, ReiserFS = 43.9). The ISO removal was about 100 times faster on JFS and XFS (0.02 sec for both), compared to 1.5 sec for ReiserFS and 2.5 sec for Ext3! All FS took comparable amounts of CPU to copy (between 46 and 51%) and to recopy ISO (between 38% to 50%). The ReiserFS used 49% of CPU to remove ISO, when other FS used about 10%. There was a clear trend of JFS to use less CPU than any other FS (about 5 to 10% less). The number of minor page faults was quite similar between FS (ranging from 600 - XFS to 661 - ReiserFS).
Conclusion : For quick operations on large files, choose JFS or XFS. If you need to minimize CPU usage, prefer JFS.

Operations on a file tree (7500 files, 900 directories, 1.9GB)

The initial copy of the tree was quicker for Ext3 (158.3 secs) and XFS (166.1) when compared to ReiserFS and JFS (172.1 and 180.1). Similar results were observed during the recopy on the same disk, which advantaged the Ext3 (120 secs) compared to other FS (XFS = 135.2, ReiserFS = 136.9 and JFS = 151). However, the tree removal was about 2 times longer for Ext3 (22 secs) when compared to ReiserFS (8.2 secs), XFS (10.5 secs) and JFS (12.5 secs)! All FS took comparable amounts of CPU to copy (between 27 and 36%) and to recopy the file tree (between 29% - JFS and 45% - ReiserFS). Surprisingly, the ReiserFS and the XFS used significantly more CPU to remove file tree (86% and 65%) when other FS used about 15% (Ext3 and JFS). Again, there was a clear trend of JFS to use less CPU than any other FS. The number of minor page faults was significantly higher for ReiserFS (total = 5843) when compared to other FS (1400 to 1490). This difference appears to come from a higher rate (5 to 20 times) of page faults for ReiserFS in recopy and removal of file tree.
Conclusion : For quick operations on large file tree, choose Ext3 or XFS. Benchmarks from other authors have supported the use of ReiserFS for operations on large number of small files. However, the present results on a tree comprising thousands of files of various size (10KB to 5MB) suggest than Ext3 or XFS may be more appropriate for real-world file server operations. Even if JFS minimize CPU usage, it should be noted that this FS comes with significantly higher latency for large file tree operations.

Directory listing and file search into the previous file tree

The complete (recursive) directory listing of the tree was quicker for ReiserFS (1.4 secs) and XFS (1.8) when compared to Ext3 and JFS (2.5 and 3.1). Similar results were observed during the file search, where ReiserFS (0.8 sec) and XFS (2.8) yielded quicker results compared to Ext3 (4.6 secs) and JFS (5 secs). Ext3 and JFS took comparable amounts of CPU for directory listing (35%) and file search (6%). XFS took more CPU for directory listing (70%) but comparable amount for file search (10%). ReiserFS appears to be the most CPU-intensive FS, with 71% for directory listing and 36% for file search. Again, the number of minor page faults was 3 times higher for ReiserFS (total = 1991) when compared to other FS (704 to 712).
Conclusion : Results suggest that, for these tasks, filesystems can be regrouped as (a) quick and more CPU-intensive (ReiserFS and XFS) or (b) slower but less CPU-intensive (ext3 and JFS). XFS appears as a good compromise, with relatively quick results, moderate usage of CPU and acceptable rate of page faults.

OVERALL CONCLUSION

These results replicate previous observations from Piszcz (2006) about reduced disk capacity of Ext3, longer mount time of ReiserFS and longer FS creation of Ext3. Moreover, like this report, both reviews have observed that JFS is the lowest CPU-usage FS. Finally, this report appeared to be the first to show the high page faults activity of ReiserFS on most usual file operations.

While recognizing the relative merits of each filesystem, only one filesystem can be install for each partition/disk. Based on all testing done for this benchmark essay, XFS appears to be the most appropriate filesystem to install on a file server for home or small-business needs :

  • It uses the maximum capacity of your server hard disk(s)
  • It is the quickest FS to create, mount and unmount
  • It is the quickest FS for operations on large files (>500MB)
  • This FS gets a good second place for operations on a large number of small to moderate-size files and directories
  • It constitutes a good CPU vs time compromise for large directory listing or file search
  • It is not the least CPU demanding FS but its use of system ressources is quite acceptable for older generation hardware

While Piszcz (2006) did not explicitly recommand XFS, he concludes that "Personally, I still choose XFS for filesystem performance and scalability". I can only support this conclusion.

References

Benoit, M. (2003). Linux File System Benchmarks.

Piszcz, J. (2006). Benchmarking Filesystems Part II. Linux Gazette, 122 (January 2006).

Share/Save/Bookmark


Posted by Anonymous (213.164.xx.xx) on Fri 21 Apr 2006 at 11:52
Nice article, but one important benchmark is missing: compatibility.

I use ext3 because most tools are written for it, and everything Linux supports it.

[ Parent | Reply to this comment ]

Posted by hansivers (64.18.xx.xx) on Fri 21 Apr 2006 at 14:09
[ Send Message ]
Good point! I was thinking about this one when I selected the tasks. Finally, since it would need to select a "representative" sample of applications to interact with each FS, I choose to stay within the scope of previous published tests (focusing on performance and CPU usage).

[ Parent | Reply to this comment ]

Posted by Anonymous (219.88.xx.xx) on Tue 17 Apr 2007 at 07:20
How do your results compare with these from:

http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm

The first column names the filesystem tested. The second column records the total time (in seconds) it took to run the filesystem benchmarking software bonnie++ (Version 1.93c). The third column records the total number of megabytes needed to store 655 megabytes of raw data.

SMALLER is better.

FILESYSTEMTIMEDISK USAGE
REISER4 (lzo)1,938278
REISER4 (gzip)2,295213
REISER43,462692
EXT24,092816
JFS4,225806
EXT44,408816
EXT34,421816
XFS4,625799
REISER36,178793
FAT3212,342988
NTFS-3g>10,414772


Each test was preformed 5 times and the average value recorded. SMALLER is better.

The Reiser4 filesystem clearly had the best test results.

The FAT32 filesystem had the worst test results.

The bonnie++ tests were preformed, with the following parameters:

bonnie++ -n128:128k:0

[ Parent | Reply to this comment ]

Posted by wouter (87.244.xx.xx) on Thu 27 Apr 2006 at 02:40
[ Send Message ]
Compatibility was something that kept me away from trying other filesystems, but I can honestly say that it hasn't been an issue in the last years or so when I've been using different filesystems than ext2/ext3 on Linux.

These days, tools and integration have been setup quite nicely by the distribution maintainers, and a fsck-interface is used by all of the ones I've tried anyway.

Ofcourse, if you want to play with more advanced options, dump filesystems or do anything out of the ordinary your findings may be different -- I would not know since I rarely, if ever, use these features.

On the other hand, IMO it rarely matters which filesystem you use anyway. I would challenge anybody to guess the filesystem running on a light to medium loaded desktop or server. Differences (in speed of mature common journaling filesystems) really are rather small for general use, and it's not until you have very specific tasks to be done or very i/o loaded systems to be managed that the choice of journaling filesystem becomes a real issue.

I believe that XFS has upcoming (or perhaps already has some) support in FreeBSD, though. FreeBSD has ext2 (read) support too. And IIRC, there was a (non-microsoft, obviously) driver adding ext2 (hence, ext3) support to Windows -- if you run that OS and want to allow it to touch your nice Linux system, that is. I suppose that falls under compatibility too.

[ Parent | Reply to this comment ]

Posted by Anonymous (200.229.xx.xx) on Tue 8 May 2007 at 13:11
Another things missing:

- What did you use to compare the times? What tools? Which commands?
- It was not mentioned the fact that Ext3 reserves 5% of the HD to the root user. (see http://ubuntuforums.org/showthread.php?t=215177)
- Some points in the article like the other is better. That to generic to say that other FS took less or spend less. (e.g.: The ReiserFS used 49% of CPU to remove ISO, when other FS used about 10%). What other FS took about 10% of CPU?

The rest of the article I found good. My tip is trying to use a better server, like a Core Duo or an AMD X2, or even an Xeon or Opteron, since we`re talking of bussiness servers (but that`s good enough if you don`t have one to test). Maybe, a good thing is to test in SATA drivers or SCSI...


Note: I use ext3 mostly because of compatibility. I like new stuff, but I don`t like to play with FS's.

[ Parent | Reply to this comment ]

Posted by Anonymous (80.78.xx.xx) on Fri 21 Apr 2006 at 13:00
Very good article!

I use ReiserFS because it's the only filesystem that supports shrinking a filesystem - veeeery useful with LVM! - JFS, XFS, ... - backup - resize - restore - with 1 TB (that's not that much anymore nowadays) - that's a joke. Measuring the performance of this operation is missing in the article.

[ Parent | Reply to this comment ]

Posted by Anonymous (212.254.xx.xx) on Fri 21 Apr 2006 at 13:57
FYI, ext3 supports resizing too! And even online resizing (while mounted).

[ Parent | Reply to this comment ]

Posted by hansivers (64.18.xx.xx) on Fri 21 Apr 2006 at 13:58
[ Send Message ]
Excellent suggestion!
The tasks selected here were not intended to be comprehensive, since I was focusing more on adding some new hard data to Piszcz's excellent benchmark series. I will surely add your suggestion in a follow-up testing.. Thanks!

[ Parent | Reply to this comment ]

Posted by Anonymous (86.82.xx.xx) on Wed 26 Apr 2006 at 08:35
XFS supports this too.

[ Parent | Reply to this comment ]

Posted by Anonymous (82.232.xx.xx) on Fri 28 Apr 2006 at 16:40
growing ok, but shrinking xfs ? this must be pretty recent then :) have you any reference to it ?

[ Parent | Reply to this comment ]

Posted by isilmendil (140.78.xx.xx) on Fri 21 Apr 2006 at 13:20
[ Send Message ]
Regarding the capacity of the filesystems compared, shouldn't it be said that ext3 is the only fs which reserves blocks for use by root alone?

You stated that all filesystems were created using default values. So ext3 loses approx. 5% of its capacity because of its reserved-blocks feature. For a fileserver you would create your data-partition without reserved blocks, as it is not needed there.

Or was this taken into account already?

Cheers,
Johannes

[ Parent | Reply to this comment ]

Posted by Anonymous (83.227.xx.xx) on Fri 21 Apr 2006 at 15:02
Those are used for defragmentation, so I keep them there aswell. Only 1% och .5% though, but I'll make sure to keep those blocks around.

/Nafallo

[ Parent | Reply to this comment ]

Posted by Anonymous (82.69.xx.xx) on Sat 22 Apr 2006 at 14:45
«So ext3 loses approx. 5% of its capacity because of its reserved-blocks feature. For a fileserver you would create your data-partition without reserved blocks,»

Why would you want to make a fileserver to perform badly? The 5% default reserve is in part to have some slack for use by 'root', but mostly because when a filesystem is nearly full its performance becomes very bad as fragmentation increases nonlinearly.

My experience is that 'ext3' requires at least 10% free space to perform decently over time (the 5% default is the absolute minimum that should be done) and 20-30% free space reserve is a lot better.

The problem is indeed intrinsic: as the filesystem nears 100% usage the chances of finding contiguous or nearby free blocks when writing or extending a file becomes a lot smaller. This applies to both extent based filesystems like JFS and XFS and to block based filesystems like 'ext3' (even if usually extent based filesystems do a bit better).

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Mon 24 Apr 2006 at 01:21
[ Send Message | View Weblogs ]
"The problem is indeed intrinsic"

Intrinsic perhaps, but highlights what is missing, such as how filesystems address the issue. Part of the cost of writing the ReiserFS is all that messing with binary trees, the report doesn't attempt to understand, or address, why Reiser thinks it is worth doing all this (admittedly a lot of people said it was just too expensive to be worth trying).

Very little in the way of "aging" the filesystem.

Nothing on consistency, or limits, or features.

I think the test choice is not great, filesystem creation, mounting, and manipulating ISOs are generally not time critical tasks (well if you don't use NTFS they aren't!). I'd happily use a file system that take 100 times longer to create than any of those tested if it conveyed other discernable benefits, and it takes my CD writer over 5 minutes to write a full ISO, so a few second here or there matters not at all to me on manipulating ISOs.

Be interesting to see also how representative people think 7500 files being 1.9GB is. My understanding was that mean file size, whilst on the way up, hadn't got to that sort of size yet. Certainly I have 15GB (df -h) of files on this box, and just under 0.5 million files (find / -type f), clocking in just under 32KB average file size for my file size, rather than the 253KB used in the test. Perhaps someone is hoarding ISOs?

It is well known there is a cost-benefit trade off in ReiserFS that means it performs relatively less well on larger files than XFS. So something like mean file size is likely to explain the difference between the results here, and of other authors, on the performance of ReiserFS.

I'd also prefer to see more edge cases examined -- what happens when 100,000 emails are delivered, and then sorted, and a selection deleted, to a single maildir? For most people it is probably performing in a sensible manner under these edge cases, that matters far more than if it takes 130 or 135s to copy 7000 files.

I'd like to see blocking I/O cases, and similar examined, email delivery being a classic, and fairly easy to test.

Hardest of all to test, I want to know that the filesystem journalling "does what it says on the tin", and have someone pull the plug in the middle of these transaction, and see nothing is corrupted, and that everything is in a consistent state, and how long the recovery to that consistent state takes, and that it is automatic.

Why no "bonnie++" statistics -- I'd have thought as a test it was trivial to run, and might show up something, even if the I/O types measured are a tad artificial.

Then again I appreciate these tests take a lot of time and effort to do.

Nothing here will shake my own choice of ReiserFS for most general purpose filesystems, it has a good performance on real world benchmarks, and is the most mature of the journalling file systems presented. Although I'm looking at XFS for a project, but not because of its performance, but because of other features it brings to the table.

[ Parent | Reply to this comment ]

Posted by Anonymous (209.76.xx.xx) on Mon 24 Apr 2006 at 10:31
No file system can protect against corruption on power outages. They can only protect against (most) corruption in the event of system crashes. If the drive was completely quiescent when the plug was pulled, you're OK; otherwise, all bets are off.

There's a myth abroad that modern drives detect falling supply voltage and do stuff to protect the media. (You might read of using the motor as a generator to provide power to "park" the heads.) If it was ever true, it was only in high-end drives, and not any more. Even making sure data really is physically on the disk surface before reporting the write complete, something we like to think of as the basic promise, is no longer widely supported, though the drives will claim otherwise. In practice you need up to a few seconds of power after a crash to drain sectors from the cache to the disk surface.

The mantra is, if reliability matters, replicate and use battery ("UPS") power.

[ Parent | Reply to this comment ]

Posted by Anonymous (84.166.xx.xx) on Wed 26 Apr 2006 at 09:25
>There's a myth abroad that modern drives detect falling supply voltage and do >stuff to protect the media. >(You might read of using the motor as a generator to provide power to "park" the >heads.) >If it was ever true, it was only in high-end drives, and not any more. This is from WD, the feature is called "auto park" AFAIK every "modern" drive has it. http://support.wdc.com/dlg/

[ Parent | Reply to this comment ]

Posted by Anonymous (24.203.xx.xx) on Sat 29 Apr 2006 at 21:47
That is only partially true though. Of course, if your hard drive does not allow you to make sure that data is really on the physical platters, you are screwed, but lets assume that it works. Just use ReiserFS with journaling. I would really like to see these tests with data=ordered/data=journal for ReiserFS to compare it.

I personally lost half of my homedir to XFS because it does only metadata journaling. Sure, all my files were there because of the journaling, but they were filled with binary zeroes. Oh and yes, the XFS FAQ says that this may happen (well at least it did when I was searching for an explanation of that behaviour back then).

I really don't want to use any "journaling" filesystem that does not journal my data, because then its worthless. I don't need a filesystem that has a clean tree and can be mounted if I lose half of my data in it. ReiserFS was pretty good at garbling file contents ("WTF is my mp3 playlist now that videos I just downloaeded?") too before they had data journaling.

[ Parent | Reply to this comment ]

Posted by Anonymous (91.77.xx.xx) on Sun 28 Oct 2007 at 02:58
> You might read of using the motor as a generator to provide power to "park" the heads
Just to let you know, Hitachi does uses this technology to move heads into proper place if power loss detected.However I have no idea if drive is able to flush buffers in this scenario.Perhaps they able, since Hitachi is intentionally(?) restricts write buffer size while read buffer size allowed to be whole size of installed RAM IC.

[ Parent | Reply to this comment ]

Posted by Anonymous (20.133.xx.xx) on Thu 27 Apr 2006 at 11:55
I use reiserFS for the simple reason that when my box loses power I can get it up and running again in a few seconds or minutes.

I'm fairly new to linux and may well have been doing something wrong, but my box regularly had the power pulled on it (my area used to be prone to power dips and I couldn't afford a UPS).

When I was using ext3 it sometimes took me a few hours to get the system to even boot, because it refused while there were errors. Once I switched to ReiserFS all those problems went away.

In a single user environment, which is realist enough for me, i'd chose the FS that allows me to recover quickest over an FS that might take a second or two longer to do something. (Can't remember the last time I copied 7000 files, if ever.)

[ Parent | Reply to this comment ]

Posted by Anonymous (82.71.xx.xx) on Fri 27 Oct 2006 at 12:57
If you use ReiserFS without a UPS on the server then you are crazy. There are masses of warnings on the Internet about the problems with ReiserFS' tree information being in unrestricted locations on the disk, so rebuilds with disk corruption are extremely risky.

[ Parent | Reply to this comment ]

Posted by drdebian (194.208.xx.xx) on Thu 30 Aug 2007 at 15:32
[ Send Message ]
Once I switched to ReiserFS all those problems went away.
I once used ReiserFS on a fileserver and after a broken PSU took the server down, it was the data that went away, because the tree ReiserFS uses was corrupted.
The recovery utilities supplied for ReiserFS tried their best to recover the filesystem, but in the end I had to restore from the backup of the night before.
After this incident, I decided to give Ext3 and XFS a try. While XFS seems to be the more modern filesytem, it does lack the ability to shrink, which is a real problem in times of LVM2 and Software-RAID.
Ext3 hasn't let me down ever since. It's disaster recovery tools are the maturest of all the filesystems tested (in addition to being included in every live/recovery CD on the planet) and it's online resizing capabilities really go together well with virtualized (as well as real) infrastructure.
On a side note, I also found Ext3 to be the most tolerant filesystem for use on "flaky" hardware. If, for example, a part of the binary tree used in ReiserFS happens to land on a defective sector of the harddisk, then it's bye-bye time for your entire FS. Ext3, on the other hand, will cope quite well and allows for full recovery using one of its redundantly stored superblocks.

[ Parent | Reply to this comment ]

Posted by Anonymous (62.253.xx.xx) on Fri 21 Apr 2006 at 13:23
File system creation time is not an important consideration as you do it only once. Likewise, mount and dismount speed aren't nearly as important as things like 'Operations on a file tree'.

[ Parent | Reply to this comment ]

Posted by hansivers (64.18.xx.xx) on Fri 21 Apr 2006 at 14:02
[ Send Message ]
Agree with you!

I added these data to replicate other previous observations about filesystem creation and mounting time.

[ Parent | Reply to this comment ]

Posted by Anonymous (213.64.xx.xx) on Sat 22 Apr 2006 at 09:56
I agree that filesystem creation time is not an important metric but I do have to say that mount speed is. I have a home server with an LVM of approx 1.2 TB using reiserfs 3.6 and it takes a long (comparably) time to mount it. perhaps closer to a minute. This can be a factor on servers wich require high uptime but have to be reooted (for whatever reason) every now and again. The purpose of this article was to test in a small bussiness environment so "five nines" is probably not a factor but it should be mentioned none the less.

[ Parent | Reply to this comment ]

Posted by Anonymous (86.139.xx.xx) on Thu 27 Apr 2006 at 09:23
The answer is quite simple really think about it leave it mounted unless you switch your server off every time you turn your back on it, But then of course you may as well run the rubbish from M$ Corp ..

Pete .

[ Parent | Reply to this comment ]

Posted by Anonymous (205.153.xx.xx) on Tue 12 Jun 2007 at 21:23
Your "simple answer" is to create toxins that are already being overproduced to the detriment of everyone on the planet.

Every time you choose to leave your computer turned on, you are choosing to disregard a finite chance that pollution will make this planet uninhabitable for future generations. That's not such a "simple answer" any more is it?

Wasting electricity despoils the commons. Turn the computer off when you aren't using it.

[ Parent | Reply to this comment ]

Posted by Anonymous (193.94.xx.xx) on Sun 23 Apr 2006 at 18:22
Especially in home usage, mount time is very important. ReiserFS sucked so badly I had to buy a spare hard disk to do a data copy & reformat (as ext3fs) for my main data partition. Mounting the one (250GB) partition took half of my machine's boot time using ReiserFS, which is unacceptable for a desktop. With ext3, I can't notice the time it takes to mount the same partition.

[ Parent | Reply to this comment ]

Posted by Anonymous (81.187.xx.xx) on Wed 26 Apr 2006 at 00:16
Hmm - why would you reboot it except after power-outages, kernel upgrades, and (unfortunately, they do happen occasionally), system crashes?
It's so much more useful to leave the machine on all the time!

Perhaps you are shutting it down to reduce noise, in which case, I commend quietpc.com to you - I can hear birdsong over mine, with the windows shut.

[ Parent | Reply to this comment ]

Posted by Anonymous (195.135.xx.xx) on Wed 26 Apr 2006 at 13:17
Because it draws power, which costs money and causes useless pollution?
Why keep it running if you don't need it or can easily wake it up if you need it?

[ Parent | Reply to this comment ]

Posted by Anonymous (70.171.xx.xx) on Wed 26 Apr 2006 at 15:58
Why keep it running if you don't need it or can easily wake it up if you need it?

Slow the CPU (especially if it's a P4 or old Athlon) and hibernate the monitor. That itself will save a good amount of energy.

Or, better, host your home server on a passively-cooled Via system. Then you can shut-down your PC any time you want, while the server stays up, sipping watts.

[ Parent | Reply to this comment ]

Posted by Anonymous (63.116.xx.xx) on Wed 26 Apr 2006 at 21:09
Slow the CPU (especially if it's a P4 or old Athlon) and hibernate the monitor. That itself will save a good amount of energy.

Actually, SWSUSP2 works well enough now that I hibernate all of the machines at home (except the server) when they're not going to be used, such as overnight. Hibernation on my HP laptop is almost infallible -- I've got an "uptime" of over a month, hibernating once or twice (and sometimes more) every day -- and, while it takes a good minute to go into hibernation, it comes out of it within 35 seconds... that's from hitting the power switch to having my KDE desktop back up.

I just wish S3 worked as well, and that the kernel folks would adopt SWSUSP2, which works so much better than the default hibernate mechanism.

--- SER

[ Parent | Reply to this comment ]

Posted by Anonymous (87.244.xx.xx) on Wed 10 May 2006 at 03:47
Mount/unmount speed is important for desktop systems.

File system creation time can't really be an issue to anyone, I guess.

[ Parent | Reply to this comment ]

Posted by Anonymous (88.100.xx.xx) on Fri 21 Apr 2006 at 13:55
Some graphs available???

[ Parent | Reply to this comment ]

Posted by Anonymous (212.254.xx.xx) on Fri 21 Apr 2006 at 14:00
What is really missing for real use, is a concurrent file modifications benchmark. On a real server (and that's what this bench is for) you have tens of processes reading/writing *at* *the* *same* *time* on the disk! What about: - Create 4 threads tha do: - Operations on a file tree - Operations into the file tree - Remove the tree - 3 times in a row. That would be an interesting bench IMO.

[ Parent | Reply to this comment ]

Posted by hansivers (64.18.xx.xx) on Fri 21 Apr 2006 at 14:32
[ Send Message ]
Yes, you're right! Bryant et al. (2002) had published extensive data about concurrent performance with the 2.4.17 kernel (the Filemark benchmark: 1,8,64,128 threads). So, it clear that it would be a great addition to the initial benchmarks. You suggested 4 threads. Since they tested up to 128 threads, what do you feel would be a "representative" test for a file server? Something like 1, 8 and 24 threads?

[ Parent | Reply to this comment ]

Posted by Anonymous (82.69.xx.xx) on Sat 22 Apr 2006 at 14:19
That would not be much of a filesystem test as such, it would be mostly an IO subsystem test, except in ideal conditions.

The problem is that unless the IO subsystem supports mailboxing and tagged queueing, which are only available in practice on SCSI and SCSI/ATA host adapters (3ware and up), multiple concurrent accesses have awful performance.

However there are already some filesystem speed tests for suitable IO subsystems, alluded to by some other comment, for example:

http://ext2.SourceForge.net/2005-ols/ols-presentation-html/img38. html

BTW, in this graph the JFS performance comes out badly, I think that an older version of JFS was used that had excessive locking like 'ext3' for most of its life.

There are more links to filesystem speed tests here:

http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050911

[ Parent | Reply to this comment ]

Posted by mcphail (62.6.xx.xx) on Fri 21 Apr 2006 at 14:31
[ Send Message ]
I have to say, I'm not very interested in how long it takes for a filesystem to mount/umount. Nor am I interested in "once only" filesystem creation. I'd rather know that my filesystem will be stable for as long as my harddisk keeps spinning. Any benchmarks for this?

NMP

[ Parent | Reply to this comment ]

Posted by hansivers (64.18.xx.xx) on Fri 21 Apr 2006 at 14:37
[ Send Message ]
As I said before, FS creation time and mounting/umounting were reported only to replicate previous observations. About data integrity, everybody would agree with you. However, it's the kind of data I've never found before.. How would you test it? One approach could be to do a bunch of operations over and over, and test the time before the first data corruption? Anybody, feel free to suggest! Thanks.

[ Parent | Reply to this comment ]

Posted by Anonymous (66.179.xx.xx) on Fri 21 Apr 2006 at 16:03
How about tests like:

* during a file tree copy, pull the plug on the machine. (could be simulated by running the test under vmware and killing the virtual machine) - then check how well it the fs recovers the data - which files got corrupted (if any), how long it takes to fix (replaying journals, etc).

* call your initial tree t0. make a new tree apporximate the same size called t1. for a concurrency test:
cp -a t0 t2 & # one tree
cp -a t1 t3 & # other tree
cp -a t0 t4 & # merge the trees
cp -a t1 t4 &
Hrm, should probably test mixed concurrency (deletes too!) so:
cp -a t0 t5 && rm -rf t5 &



I'm sure there's more to be added, maybe this will give you some ideas.

[ Parent | Reply to this comment ]

Posted by Anonymous (84.92.xx.xx) on Sat 22 Apr 2006 at 17:35
Coincidentally, today I also came across some tests on the impact of write caching that relate to data integrity: http://sr5tech.com/write_back_cache_experiments.htm (Last updated October 27, 2003).

In these experiments the test variable was disk configuration rather than file system. A similar test across different file systems might produce a worthwhile indication of comparative reliability.

I suspect write caching would need to be disabled in the disk system to prevent corruptions of the kind being investigated in the link above from affecting the results. This would have an impact on absolute performance, but relative measurements could still be made.

[ Parent | Reply to this comment ]

Posted by Anonymous (70.171.xx.xx) on Wed 26 Apr 2006 at 16:03
How would you test it? One approach could be to do a bunch of operations over and over, and test the time before the first data corruption? Anybody, feel free to suggest!

Yank the plug while multiple processes are updating the disks. See what happens.

Repeat 8 or 10 times.

Yes, it's manual and time-consuming.

[ Parent | Reply to this comment ]

Posted by Anonymous (193.219.xx.xx) on Fri 21 Apr 2006 at 14:40
I'm personally using XFS on all my servers and desktop systems, mostly because I trust it - I've got dozens of power failures/unexpected reboots and I'm yet to be disappointed by how XFS handles such unclean unmounts.

With ReiserFS 3 on the other hand, I've got two such events and both times It managed to somehow completely destroy multiple files which were not even open at the time of incident.
(Yes this is anecdotal evidence, but I'm not using it anymore because of there incidents)

[ Parent | Reply to this comment ]

Posted by Anonymous (213.224.xx.xx) on Fri 21 Apr 2006 at 20:49
Same thing here... I have gotten atleast 4 incidents where ReiserFS stopped functioning properly, ending in dataloss, reïnstalling server etc. I don't know if I would trust XFS since most distributions support Ext3 or ReiserFS as default filesystem (I go for Ext3).

It just isn't fun to see the filesystem break and read a publicity message about being able to ask questions for $25... (ReiserFS)

[ Parent | Reply to this comment ]

Posted by Anonymous (24.203.xx.xx) on Sat 29 Apr 2006 at 21:56
Don't trust your XFS too much. I lost half a homedir to XFS because of this: http://oss.sgi.com/projects/xfs/faq.html#nulls

Get a recent ReiserFS and mount it with data=journal!

[ Parent | Reply to this comment ]

Posted by hansivers (64.18.xx.xx) on Fri 21 Apr 2006 at 15:14
[ Send Message ]
I've yet to see a benchmark methodology to test filesystem reliability, not only performance or CPU-usage. It's a bit surprising since reliability is surely one of the most important factors when an admin has to select a FS. But I feel that we are left most of the time with anectodal evidence and our own (bad) experiences.

An intesting (and simple) test would be to simulate power failures during copy/delete file operations (large ISO file and files tree) and see how each FS handles each situation. But I'm aware that this is only a small part of real data integrity testing..

If anybody has seen hard data about FS reliability, feel free to post a link here. I would be very interested to investigate this, in order to produce more comprehensive and real-world benchmarks.

[ Parent | Reply to this comment ]

Posted by Anonymous (85.76.xx.xx) on Thu 27 Apr 2006 at 07:10
I also would like to see some reliablitity tests. I've been using all linux file systems and nowadays only ext2 as /boot and xfs for the rest. I also use lvm extensively in all servers under my administration.

However, I have found one annoying feature with xfs: whenever there is power failure then all open text files are filled with "^@^@^@^@^@^@^@^@...". You can easily replicate this by opening file /etc/fstab to emacs and then unplug the power cord. Why /etc/fstab.... well, then you know why I find this feature REALLY annoying. So, this powerfailure test would be the First on my test list for real world servers.

Anyway, I enjoyed reading your article. And being on professional researcher myself I know that there is always room for improvement. Looking forward reading the new comparison from you.

[ Parent | Reply to this comment ]

Posted by Anonymous (130.156.xx.xx) on Fri 21 Apr 2006 at 16:11
Does GRUB support /boot on XFS yet?

[ Parent | Reply to this comment ]

Posted by Anonymous (159.53.xx.xx) on Fri 21 Apr 2006 at 22:52
Why would you want to boot off of an XFS partition anyways? It's my understanding that everything in the boot partition is read into memory anyways, so file system speed really isn't that important. Also, since most boot recovery tools only work with ext2 (and therefore ext3), you'll be much better off using one of those. Then you can use XFS on the other partitions.

[ Parent | Reply to this comment ]

Posted by Anonymous (80.219.xx.xx) on Fri 21 Apr 2006 at 23:46
Where have you been hiding for the last years?? GRUB can since very long time boot off from a XFS drive:
mail / # mount|grep -i xfs
/dev/hda3 on / type xfs (rw,noatime,logbufs=8,logbsize=32768,ihashsize=65567)
/dev/mapper/vg-usr on /usr type xfs (rw,nodev,noatime,logbufs=8,logbsize=32768,ihashsize=65567)
/dev/mapper/vg-home on /home type xfs (rw,nosuid,nodev,noatime,usrquota,grpquota,logbufs=8,logbsize=327 68,ihashsize=65567)
/dev/mapper/vg-opt on /opt type xfs (rw,nodev,noatime,logbufs=8,logbsize=32768,ihashsize=65567)
/dev/mapper/vg-var on /var type xfs (rw,nodev,noatime,usrquota,grpquota,logbufs=8,logbsize=32768,ihas hsize=65567)
/dev/mapper/vg-tmp on /tmp type xfs (rw,noexec,nosuid,nodev,noatime,usrquota,grpquota,logbufs=8,logbs ize=32768,ihashsize=65567)
/dev/hda1 on /boot type xfs (rw,noatime,logbufs=8,logbsize=32768,ihashsize=65567)
mail / # ls -lah /boot/grub/*xfs*
-rw-r--r-- 1 root root 11K Jul 1 2005 /boot/grub/xfs_stage1_5
mail / # grub --version
grub (GNU GRUB 0.96)
mail / #

[ Parent | Reply to this comment ]

Posted by Anonymous (72.88.xx.xx) on Thu 12 Oct 2006 at 04:57
"GRUB can since very long time boot off from a XFS drive" ... For you, yes. For me, not often. Even the new Debian installer will tell you to use LILO for booting off a XFS partition.

[ Parent | Reply to this comment ]

Posted by Anonymous (24.6.xx.xx) on Sat 22 Apr 2006 at 00:59
Yes, it does.

I have xfs on all my nfs servers. OS is SuSE 9.2/9.3/10.0.

[ Parent | Reply to this comment ]

Posted by Anonymous (192.121.xx.xx) on Mon 24 Apr 2006 at 08:27
As said, GRUB has supported XFS on /boot for a long time. The only thing that's not supported is if you install the bootloader to an XFS partition (as opposed to installing it to MBR)

- Peder

[ Parent | Reply to this comment ]

Posted by rmcgowan (143.127.xx.xx) on Fri 21 Apr 2006 at 17:15
[ Send Message ]

You said "While recognizing the relative merits of each filesystem, an system administrator has no choice but to install only one filesystem...". I don't understand why you believe there is or should be this sort of restriction.

Can't the administrator decide to 'partition' usage onto different volumes, using different fs types, based on their performance for the usage?

For example, I might create a volume to hold users homes, expecting many small files while requiring maximum speed, and so choose to use XFS, while a volume to hold large files (video, audio, backups, still images, etc.), I might choose to use JFS instead.

Note that I'm not advocating or even suggesting that the above is in some way an optimal setup, it's just an 'off the top of my head' example. The question is "Why shouldn't I be able to do this sort of thing if I so choose?" Is there something I'm missing?

[ Parent | Reply to this comment ]

Posted by hansivers (64.18.xx.xx) on Fri 21 Apr 2006 at 17:37
[ Send Message ]
Sorry, this sentence was a bit imprecise.. The idea was that, utimately, for every partition, a choice has to be made since only one FS could be installed. This sentence was put there to underline the fact that, in some benchmark tests, the authors tend to conclude something like "every FS has its own merits", which leave the reader with no real answer to the question : what is the best FS to install on my partition(s). Thanks for your comment!

[ Parent | Reply to this comment ]

Posted by Anonymous (87.2.xx.xx) on Sat 22 Apr 2006 at 00:52
From the Piszcz's results, JFS seems faster than XFS, JFS seems the fastest filesystem (EXT2 and EXT3 seem also faster than XFS), look at the Total Test Time ( http://linuxgazette.net/122/misc/piszcz/group002/image037.png ). So... who says the truth?

[ Parent | Reply to this comment ]

Posted by Anonymous (82.69.xx.xx) on Sat 22 Apr 2006 at 01:58
A pretty awful article... In particular the method used is not clearly explained.

For example, did you unmount the relevant partition before every single operation? You don't say, but if you did not (and not many people know that this is essential) your results are largely meaningless.

For far more sensible, documented and informative tests look at mine here:

http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050908

and in a few entries around that date. Some amusing updates here:

http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050913
http://WWW.sabi.co.UK/Notes/anno06-2nd.html#060416

[ Parent | Reply to this comment ]

Posted by Anonymous (70.171.xx.xx) on Sat 22 Apr 2006 at 02:02
I assume you used ext3 defaults here, which is not really a fair comparison of ext3's potential. Justin Piszcz allowed me to use his script to run his tests on my machine. I compared the ext3 "tuned" modes, and found ext3 with dir_index, and dir_index with data=writeback or data=journal improved most tests, in some cases where directories were involved, remarkably. My conclusion is that ext3 with dir_index (and depending on your usage journal or writeback) wins out for normal desktop performance, across the board. Changing the commit=n interval to longer than the default 5 seconds also improves performance.

[ Parent | Reply to this comment ]

Posted by Anonymous (82.69.xx.xx) on Sat 22 Apr 2006 at 14:29
«My conclusion is that ext3 with dir_index (and depending on your usage journal or writeback) wins out for normal desktop performance, across the board.»

I used to think much the same, but over time I discovered that I'd rather use JFS across the board (except for filesystems that need to be accessible from MS Windows, where I use 'ext2' as there is an excellent filesystem for it).

The first reason is that 'ext3' performance is awesome when the filesystem has just been created and loaded, but degrades very badly over time while JFS degrades significantly but a lot less:

http://WWW.sabi.co.UK/Notes/anno06-2nd.html#060416

The second reason is that probably because of some happenstance 'dir_index' can slow down things pretty significantly:

http://WWW.sabi.co.UK/Notes/anno05-4th.html#051204

A rather less significant advantage of JFS is that since it uses extents and dynamically allocated inodes it usually uses a lot less space for metadata, often like 3-5% of the total filesystem space.

[ Parent | Reply to this comment ]

Posted by fsateler (201.214.xx.xx) on Sat 22 Apr 2006 at 02:03
[ Send Message | View Weblogs ]
Good article. I'd like to note one thing though: You mentioned initial and residual capacity of the filesystems, although you did not mention capacity when the drive is used, ie: I have 1000 files of 1Mb each, is the used space 1000Mb? I think that is the most important thing, since your file server isn't useful when its drives are empty, but rather when they are being used (I don't care if I have wasted 5% of my drive if it is empty, but I do care when the drive is almost full).
--------
Felipe Sateler

[ Parent | Reply to this comment ]

Posted by Anonymous (68.124.xx.xx) on Sat 22 Apr 2006 at 02:27
..except that XFS does not yet support ACLs/SELinux....

[ Parent | Reply to this comment ]

Posted by Anonymous (62.0.xx.xx) on Sat 22 Apr 2006 at 08:29
Sure it does, just take a look at the kernel configuration:

grep -i acl config-2.6.15:

CONFIG_XFS_POSIX_ACL=y

[ Parent | Reply to this comment ]

Posted by Anonymous (87.2.xx.xx) on Sat 22 Apr 2006 at 09:29
Yeah... it supports them.

[ Parent | Reply to this comment ]

Posted by Anonymous (84.30.xx.xx) on Sat 22 Apr 2006 at 14:14
IIRC XFS was actually the first FS implementing ACL's. The rest followed suit later that year.

[ Parent | Reply to this comment ]

Posted by Anonymous (212.2.xx.xx) on Tue 16 May 2006 at 13:28
It does, both. I use XFS with ACLs and SELinux (extended attributtes are important here) for some time (CentOS).

[ Parent | Reply to this comment ]

Posted by Anonymous (24.98.xx.xx) on Sat 22 Apr 2006 at 06:16
EXT3 has sevire limitations ones you'll go beyond 1 Tb size partitions, number of files/directories per directory, etc. It all doesn't matter if you are using relatively small computer system. We are using systems with partition sizes (the largest) 6.4 Tb. It is impossible to use any other than XFS file system. Formatting will take several days if not more in case of ext3 fs. mounting will take half a day for raiserfs. jfs just wasn't stable enough. Recovery is excellent (for XFS). The only thing is correct - no SELinux, but for system with that much activity and disk space I wouldn't use SELinux anyway - too much overhead - the system is busy on its own. So, my summary would be - for a small system doesn't matter - for superlarge systems you are limited to XFS (with JFS trailing - especially I've heard that JFS support is removed fri Fedora5 but I am not sure)

[ Parent | Reply to this comment ]

Posted by Anonymous (82.69.xx.xx) on Sat 22 Apr 2006 at 14:17
Your comments on scalability are very appropriate.

The biggest problem however is not making or formatting a filesystem, it is how long it takes to 'fsck' it, and how much memory is necessary.

Times of over two months to 'fsck' a filesystem have been reported for 'ext3' and XFS sometimes requires more than 4GB of memory to run 'fsck' (it is possible to create and use an XFS filesystem on a system with a 32 bit that can only be 'fsck'ed on a 64 bit CPU, and at least one case has actually happened).

The basic problem is that while very large filesystems using JFS or XFS (or very recent 'ext3') perform well on RAID storage, because they take advantage of the parallel nature of the underlying storage system, 'fsck' is single threaded in every Linux file system design that I have seen. Bad news.

More details here:

http://www.sabi.co.uk/Notes/anno05-4th.html#051012
http://www.sabi.co.uk/Notes/anno05-4th.html#051009

I am very surprised that your experience is that «jfs just wasn't stable enough», perhaps you may want to report to the JFS mailing list, as the authors of JFS are very responsive to reports of instaibility, and usually find a fix pretty quickly.

As to FC5 support, all Red Hat systems only support 'ext3', at least officially and in the installer, but after installation you can use any of the filesystems included in the kernel. I typically install to a small temporary partition which is 'ext3' formatted, and then convert it to JFS by copying its contents over to the real ''root'' partition which is JFS formatted.

[ Parent | Reply to this comment ]

Posted by Anonymous (193.230.xx.xx) on Tue 2 May 2006 at 08:54
IIRC, for FC you can add 'reiserfs', 'xfs', 'jfs' as parameters to the prompt (right after the cd/dvd boots) and those filesystems are available to create.

[ Parent | Reply to this comment ]

Posted by Anonymous (202.1.xx.xx) on Sat 22 Apr 2006 at 07:33
its pity that this article doesnt consider 'shredding' times of large files..

[ Parent | Reply to this comment ]

Posted by Anonymous (130.127.xx.xx) on Sat 22 Apr 2006 at 18:10
That would be because everyone figured out years ago that "shred" doesn't work on journalled file systems.

[ Parent | Reply to this comment ]

Posted by Anonymous (208.54.xx.xx) on Sun 23 Apr 2006 at 20:10
Stupid newb question: Why can't journalled file systems be shredded? Thanks in advance.

Fig

[ Parent | Reply to this comment ]

Posted by Anonymous (69.128.xx.xx) on Mon 24 Apr 2006 at 05:32

The shread command works by writing random data, zeros, and ones over and over to the spots on the disk that the file you want to shred was located. The hopes are that with enough writes, the data will actually be overwritten on the disk. (The head of the hard drive varies a small amout as it traces its path over the disk, so the data might not be completely erased).

The problem is, journaling file systems write data to the journal before they write it to the final location on the disk. So shredding the file blocks on the disk, an attacker might be able to recover data from wherever on the disk the journal is located, even if the data blocks are unreadable.

The real issue is that shredding a file even on ext2 does not always work, because modern hard drives sometimes transparently remap bad sectors on the disk... so what the operating system thinks is the location on the disk it originally wrote mysecret.txt to, the drive might have moved it. An attacker could still read data from the "bad" sector using the right tools.

Realisticly, shred should never be relied on. Using dm-crypt to encrypt a full filesystem is a much better solution, and with today's CPUs power, the performance is small enough trade off for secrecy.

For more information about securly destroying data, you can read the paper TKS1 on this page, which is really interesting. (Scroll down to section 3 on page 4)

[ Parent | Reply to this comment ]

Posted by Anonymous (211.27.xx.xx) on Tue 25 Apr 2006 at 15:35
Shred does actually work on ext3 with its default settings.

From man shred:

"... Note that shred relies on a very important assumption: that the file
system overwrites data in place. This is the traditional way to do things, but
many modern file system designs do not satisfy this assumption.

[snip ]...

In the case of ext3 file systems, the above disclaimer applies (and shred is thus
of limited effectiveness) only in data=journal mode, which journals file data in
addition to just metadata. In both the data=ordered (default) and data=writeback
modes, shred works as usual. "

[ Parent | Reply to this comment ]

Posted by ericbrasseur (62.235.xx.xx) on Sat 22 Apr 2006 at 10:30
[ Send Message ]
I have used all these filesystems on machines that sometimes experience brutal hangs or power failures. Every filesystem is supposed to recover from such events and they most often did. But Ext3 is the sole one that always recovered correctly and never made me loose a file. What's more with Ext3 I never have to perform the recovery manually. I agree with the conclusions of the article and for a while I thought I'd adopt XFS but I was forced back to Ext3 just because of the reliability.

[ Parent | Reply to this comment ]

Posted by Anonymous (82.69.xx.xx) on Sat 22 Apr 2006 at 13:57
«Every filesystem is supposed to recover from such events and they most often did.»

Thats rather wide of the mark: most filesystem are supposed to recover A CONSISTENT STATE of the METADATA ONLY from such events.

'ext3' additionally can make an attempt at recovering the contents of files too, if ordered or data journaling is enabled.

However the proper way to ensure data (as opposed to metadata) recoverability is to ensure the application handles that, using atomic data transactions, because that's the only way, and even if 'ext3' often succeeds blindly, that is not the right way.

Large scale filesystems like JFS and XFS, designed for mission critical applications, don't do any attempt at data recovery, because indeed that should be handled by the applications themselves.

Many people who don't understand this then complain that then these two filesystems cause loss of data...

[ Parent | Reply to this comment ]

Posted by Anonymous (24.203.xx.xx) on Sat 29 Apr 2006 at 22:12
Well that obviously depends on your usage scenario. I think for home/small business your assumption is just not right. If I start my music player, download some large video and get my mail and I suddenly have a power outage then all those files should be intact when I switch my computer back on. With XFS my playlist might be 0000..., my download might be 0000... and my mails are 0000... and no longer on the server. With ReiserFS my playlist will be part of the video, parts of my mails might be in the video and the mails could have some other garbage in them. This all assumes that you don't use data=journal for reiser and don't use sync mode with a decent harddrive for XFS but still, some time ago there was no data=journal and maybe you just don't have a decent harddrive because your boss wants to spend less money. Now tell me, what is my music player going to do about it? :)

[ Parent | Reply to this comment ]

Posted by drdebian (194.208.xx.xx) on Thu 30 Aug 2007 at 16:38
[ Send Message ]
100% ACK.

In all the cases you mentioned, your chances would indeed be best if you indeed had your data on an Ext3 filesystem, since it's the only one not using some binary tree structure to manage where your data is stored.

The problem of today's hardware is all the caching that's going on at various levels. The application can't really tell whether a certain file has really been written to a block on the harddisk, because all of that is completely hidden away in some HAL.

I think that Sun's approach with it's ZFS filesystem is suitable to tackle this challenge. It uses end-to-end checksums to detect file corruption at the harddisk right through to the application level. Too bad it isn't GPL'd, so we'll hardly see much of it in the Linux world.

[ Parent | Reply to this comment ]