Posted by banchieri on Thu 20 Jan 2011 at 09:52
Recently, my focus of interest turned to volume management. Since with Linux, LVM2 seems to be the only viable solution (unless you go for cluster filesystems), I started reading about LVM. Somewhere, I read about setting up a loop device as PV and that's where I got intrigued: Couldn't I just use a sparse file with one single large hole and let my ext4 fill that when there's actually something written onto the loop device?
So for proof of concept, I created the volume with:
(1)dd if=/dev/zero of=FILE bs=X seek=Y-1 count=1, did thelosetup /dev/loop0 FILE,mke2fs /dev/loop0,mount /dev/loop0 /mnt,du -k FILE and filesystem usage with df -k /mnt.
Guess what? It worked!
For further testing, I started copying files into my "dynamic volume", watched it grow, and ultimately unmounted it.
Since that worked out so well, I'm gonna use that trick in the Linux Containers setup I'm currently working on, because it
(a) lets the volume grow dynamically and
(b) puts an upper limit on volume size.
Ah, and: With ext3/4 as underlying filesystem, ext2 within the loop device is enough. Why journal twice?
[ Parent ]
[ Parent ]
deleting the file would not zero its contents - So the space will keep occupied.Where have I heard similar concepts, lately? Sure, talking about how file systems lack a way to inform the undelying device that some blocks are not needed anymore! Can you see the parallel?
Take a solid state device, make a file system on it, populate it with large files, then delete them. The SSD will still see most of its blocks as used, and its wear reduction strategy will lose efficiency. Some solutions have been proposed (zeroing the blocks, issuing a trim command...) but, to date, none has reached a big consensus.
In the case above, the ideal solution would be to have a top-level file system which zeros or trims blocks upon file deletion, and a loop device that understands that and translates it to holes in the underlying file.
[ Parent ]
Well, yes, that's how filesystems behave. On the other hand, if you put another large file onto the volume after deleting the first one, the volume won't grow because blocks get reallocated by the "inner" filesystem.
So far, I haven't heard of auto-shrinking filesystems (with the notable exception of tmp/ramfs-like ones).
[ Parent ]
[ Parent ]
[ Parent ]
I've never stated that the idea is new. It just crossed my mind and I thought that it might be useful for somebody else, so I shared it with you folks.
Obviously, it was useful for you...
[ Parent ]
[ Parent ]
That's close to another idea that once crossed my mind: A garbage-collecting filesystem which could do that sort of thing "in situ".
The basic idea is as follows:
(1) Filesystems usually divide the allotted space into equally-sized cylinder groups (CGs).
(2) Consequently, they maintain per-CG allocation bitmaps.
(3) If you'd also monitor per-CG allocation density and detect that that some CG's allocation density drops below a certain limit (e.g., 25%), you'd transparently start moving blocks from that CG to other CGs (until the CG is empty) in order to reduce the overall CG count.
(4) Moreover, that would be a great opportunity to auto-defragment files. You're moving those blocks anyway...
If you combine that with volume management based on CG pools, filesystems could dynamically allocate free CGs when they need them as well as release them after they're garbage-collected.
How does that sound?
[ Parent ]
[ Parent ]
I think I'm able to guess what you're hinting at...
You're concerned with the fact hat "inner" metadata appears as ordinary data to the "outer" filesystem and thus isn't treated specially. Haven't thought of that, actually. But maybe you're able to fix that by altering the "outer" filesystem's type of journalling...
Still IMHO, only journalling the "inner" filesystem bears a higher risk of (meta)data corruption.
As usual, it's trading performance for data integrity: If you want to play really safe, you'd journal the "outer" as well as the "inner" filesystems at the price of degraded performance.
If I'm informed correctly, that's exactly what happens when you run an ext3/4 guest on a VMware ESX host.
[ Parent ]
So I would definitely opt for journaling on the inner file system.
[ Parent ]
[ Parent ]
[ Parent ]
That's exactly what I wanted to achieve: thin (over)provisioning, but only with mechanisms provided by "your ordinary Linux distro". ;-)
[ Parent ]
[ Parent ]
[ Parent ]
[ Parent ]
[ Parent ]
[ Parent ]
[ Parent ]
[ Parent ]
[ Parent ]
[ Parent ]
[ View Weblogs ]
Dear editors,
thanks for the "sparse file".
banchieri
[ Parent ]