Contributed by marco on from the dept.
So what is a large disk?
Currently, disk sectors of 512 bytes are addressed using a signed integer of 32 bits called a daddr. That means we can have 2^31-1 sectors, which ends up to being 1TB. Due to how disklabels work, this implies we can only handle disks and partititons up to 1TB. This is also true for FFS2, although it uses 64-bit daddrs: disklabel, drivers, scci layer and kernel itself are not 64-bit daddr clean yet. The userland utilities that manipulate filesystems directly like newfs(8), fsck_ffs(8) and dump(8) are also not ready for big disks.
Apart from some top(1) and ksh(1) stuff, this week I worked mostly on the following things: disklabels, FFS2 and the userland tools newfs(8) and fsck_ffs(8), dump(8) and restore(8). All pieces of the puzzle to move to a 64-bits daddr clean system.
First, the on-disk disklabel has been modified to allow for large partitions and disks. Due to the fact that we cannot change the size of the on-disk disklabel, we had to come up with some tricks to extend the addressing. This was work partly done before the hackathon, and resulted in this commit:
So we now can define disklabels like this:
# size offset fstype [fsize bsize cpg] a: 4.9T 0.0T 4.2BSD 2048 16384 16 # Cyl 0*-118008* c: 131072.0T 0.0T unused 0 0 # Cyl 0 -4294967295*this is not a real disk ;-)
Don't worry, old disklabels will still work with new systems, they are converted on the fly by the kernel.
The diff to actually create and manipulate the new disklabels using disklabel(8) is committed, but the actual support to edit and view large partitions is not yet committed. Some more testing has to be done to see if it does not cause regressions. But you can see the example output above ;-)
Before the hackathon, pedro@ and millert@ committed the FFS2 code, and we already hunted and fixed bugs, so FFS2 was in pretty good shape at the start of the hackathon.
The FFS2 work I did was mainly testing and modifications to userland tool to be able to actually use the larger daddr FFS2 features. I had access to a 2T array, and to be able to actually use this array, some 32-bit daddr uncleanlyness had to be fixed in newfs(8) and fsck_ffs(8). After a day of work, I was able to create and use an almost 2TB fileystem! I tested things were working ok by using a little program I wrote to fill the disk with predictable data and reading it back. I also ran fsck_ffs(8) on both clean and unclean fileystems with different block and fragment sizes. Bugs were found and fixed and here's an demonstration of the result:
Filesystem Size Used Avail Capacity Mounted on /dev/sd0a 1.9T 5.6G 1.8T 0% /mnt
FFS2 is much faster in creating filesystems: depending on the block and fragment sizes, creating a 2TB filesystem takes between 2 and 12 seconds (yeah, that is not a type!). Filling it is a different story: i had to run my test programs for hours during the night to fill the filesytem up to 50% usage.
The reason why 2TB works is that 2TB falls below the 32-bit unsigned boundary: the actual 32-bit numbers are interpreted as daddrs by the scsi layer. If we want to go beyong 2TB, more work is needed, mainly in the kernel and drivers. This is work in progress as I am writing this. After that is done, we will be able to break the 2TB barier.
Even FFS1 users will benefit from this: right now the disk size limit is 1TB, but with a 64-bit daddr, larger disk can be used with FFS1, as long as the partitions are smaller than 1TB.
The diffs for newfs(8), fsck_ffs(8) and friend are not ready to be committed yet, but the major parts of the work is done at the hackhaton, what remains is some cleanup and more testing (both FFS1 and FFS2). I am sure you will see requests for that soon. The work was done in cooperation with krw@, pedro@ and deraadt@, who were at the hackathon@, and millert@, who did not attend the hackathon but was working with us anyway.
I'd also like to take this opportunity to thank the beck@ for the very nice hike (really special for a guy who lives in a house below sea level ;-), and everybody involved for the great infrastructure to be able to work with everything you need in place and of course all hackers for good company, fun and in general good shitzzz.
Oh, BTW, FFS2 in current is ready for testing, so please do that if you have any chance. You won't be able to use large partitions yet, but that should not prevent you.
(Comments are closed)
By Brynet (Brynet) on
As for this line in the text:
<i>(yeah, that is not a type!).</i>
Did you mean typo? ;-)
Comments
By Brynet (Brynet) on
> <i>(yeah, that is not a type!).</i>
>
> Did you mean typo? ;-)
Haha, right after correcting a typo I make a mistake myself.. how unfortunate :(
By Anonymous Coward (149.169.135.33) on
(e.g. should FFS1 essentially be deprecated?)
Comments
By Otto Moerbeek (otto) on http://www.drijf.net
>
> (e.g. should FFS1 essentially be deprecated?)
By Otto Moerbeek (otto) on http://www.drijf.net
>
> (e.g. should FFS1 essentially be deprecated?)
Oopsie, I was too fast.
Nah, since FFS1 uses 32-bit daddrs it is more efficient for smaller partitions, i.e. floppies.
By Steve Shockley (68.80.137.106) steve.shockley@shockley.net on
When you say "convert", does that mean there will be a flag day (i.e. filesystems used with the new system won't work with old kernels) or is it just converted in memory and not written back to the disk?
Comments
By Otto Moerbeek (otto) on http://www.drijf.net
>
> When you say "convert", does that mean there will be a flag day (i.e. filesystems used with the new system won't work with old kernels) or is it just converted in memory and not written back to the disk?
The kernel only converts in-memory and does not change the on-disk disklabel.
disklabel(8) does only write new labels, but the format is such that old kernels still will be able to use the new ones, as long as you do not actualy have large partitions. There is a corner case where an old fsck_ffs run on a new disklabel will not be able to find alternate superblocks. But that is only relevant if both your primary and your first alternate superblocks are corrupted. In that case you'll need to use the -b flag to fsck_ffs.
By Bryan (24.92.155.36) on