[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: ffs, ext2, metadata operations, FreeBSD
From: Wolfgang Denk
Date: 3 Mar 2000 08:29:14 -0000


Hallo Christian,

in message <NDBBJMNNEPKCHPDOJAEBIEJGDJAA.christian@jacken.net> you write:
>
> > Wer schrieb das?
> 
> www.cons.org/cracauer/freebsd.html

Ich finde es immer  schön  (und  selten  völlig  ohne  vordergründige
Subjektivität),  wenn  Anhänger  der  einen  Religion über die Lehren
einer anderen schreiben :-)

> > Darf ich zuallererst mal fragen, warum Du das willst? Hast Du denn in
> > der Praxis irgendwelche Erfahrungen gemacht, die Dich glauben lassen,
> > die aktuellen einstellungen seien zu "unsicher"?
> 
> Nicht ich, aber ein Freund.

Hm - und was hat der gemacht? Und wodurch wurde der Crash verursacht?


> Außerdem: Stimmt folgendes nicht?
> 
> <Zitat>
> These operations are much more dangerous when they're interrupted. We're not
> just talking losing data from those files you just wrote, we're talking
> losing files you didn't even touch in last days! The "meta-data" operations
> can cause you to lose your whole filesystem (partition) or they can delete
> or truncate files that just happened to be "neighbors" of the files you
> wrote.
> </Zitat>

Natürlich stimmt das - als Worst-Case-Szenario. Aber  dieses  Problem
ist  allen  "klassischen"  Unix-Filesystemen  gemeinsam, egal, ob das
jetzt FFS, UFS, oder ext2 sind. Insofern ist  da  Linux  erst  einmal
nicht  wesentlich besser, aber auch nicht schlechter als Solaris oder
andere.

Wenn es Dir auf ein SICHERES Filesystem ankommt, dann  wirst  Du  Dir
eines aussuchen müssen, das Journalling unterstützt. Und bei dem, was
Du  an Anwenudung genannt hast, scheint mir Reiser-FS keine schlechte
Lösung zu sein.


> > Ich weiß nicht, ob Du auch die diversen Kommentare gelesen hast,  mit
> > denen  z.  B.  Linus  die Entscheidung für die in ext2 implementierte
> > Strategie begründet?
> 
> Wann war das ungefähr?

z. B.:

|| From: torvalds@cc.Helsinki.FI (Linus Torvalds)
|| Newsgroups: comp.sys.powerpc,comp.os.linux.development.system
|| Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
|| Date: 3 Aug 1995 10:24:34 +0300
|| Organization: University of Helsinki
|| Message-ID: <3vptji$mbm@kruuna.helsinki.fi>
|| References: <3tk1r3$l8o@news1.halcyon.com> <3vlva1$bdl@helena.MT.net> <3vm27k$pkj@fido.asd.sgi.com> <3vnorm$oje@senator-bedfellow.mit.edu>
|| 
|| In article <3vnorm$oje@senator-bedfellow.mit.edu>,
|| Greg Hudson <ghudson@mit.edu> wrote:
|| >Larry McVoy (lm@neteng) wrote:
|| >: 	2.  Do this.  Turn off the sync meta update in FFS.  Untar a
|| >: 	big directory _into_ the file system and power off the machine
|| >: 	in the middle.  Now do the same with Linux.  Please run fsck
|| >: 	under script and post the outputs.  That's what conviced me that 
|| >: 	Linux was better.  Go do it and report back to us.
|| >
|| >I'm willing to believe that the FFSfilesystem comes out worse than
|| >the Linux filesystem, but what does that prove?  You shouldn't be
|| >turning off synchronous meta-data updates in your filesystem.  (It
|| >might be enough of a performance boost in a news spool that it will
|| >save some administrators some money, but this is explictly a mode
|| >where reliability is NOT a design goal.)  Last I checked, under normal
|| >conditions Linux ext2 is not as careful as FFS about keeping the
|| >filesystem consistent during writes, so a spontaneous reboot is more
|| >likely to damage a Linux filesystem than a NetBSD filesystem.  This is
|| >certainly my experience in practice.
|| 
|| I've said this before, and I guess I'll say this again.
|| 
|| 	BSD "synchronous" filesystem updates are braindamaged.
|| 
|| 	BSD people touting it as a feature are WRONG. It's a bug.
|| 
|| Synchronous meta-data updates are STUPID:
|| 
||  (a) it's bad for performance
||  (b) it's bad for filesystem stability
|| 
|| (a) is obvious, and even BSD people will agree to that.  But (b) is not
|| as obvious, and BSD people mostly say "Huh?"
|| 
|| In short, updating meta-data synchronously almost guarantees that the
|| filesystem structure will be up-to-date after a crash, but it will _not_
|| guarantee that the actual file data will be up-to-date.  In fact, it
|| will often result in a filesystem that "fsck" thinks is perfectly ok,
|| _despite_ the fact that you have corruption. 
|| 
|| In fact, the way to get a stable filesystem is to do the updates exactly
|| reverse to the way BSD does it: write out the data blocks first, _then_
|| write out the meta-data.  The problem with this approach is you end up
|| with a partial ordering in which to write the data, and ordering it
|| isn't trivial. 
|| 
|| Doing synchronous meta-data updates is a cludge to make fsck not
|| complain as much about corrupted filesystems.  It doesn't fix the
|| problem, it only fixes some of the symptoms.  Touting that as a Good
|| Thing (tm) is idiocy, IMNSHO (you'll feel safe because fsck doesn't tell
|| you anything is wrong). 
|| 
|| What makes the BSD approach even more stupid is the fact that the
|| meta-data inconsistencies are the one thing fsck _can_ fix, so trying to
|| keep meta-data up-to-date is in some respect a complete waste of time. 
|| 
|| It's much better to instead concentrate on making a better fsck, as fsck
|| is run only once at bootup (and often not even then as most bootups will
|| be from a clean filesystem) than to take the performance hit at
|| run-time.  That's the approach the linux filesystems take (well, at
|| least the ext2fs filesystem: most other filesystems have a rather stupid
|| version of fsck). 
|| 
|| Of course, if filesystem integrity is important for you, you don't want
|| to use the linux ext2fs.  That isn't what I'm trying to claim.  What I'm
|| saying is that ffs isn't really better in this regard.  If you want
|| filesystem consistency, you have to use some kind of journalling
|| filesystem. 
|| 
|| Alternately you can make a unix-type filesystem and do the disk updates
|| the _right_ way: data blocks first, then indirect blocks (starting from
|| the lowest level indirected blocks), then the inode, and finally the
|| directory entry (and going in the opposite direction when you're
|| deleting a file).  Note that you don't need to do any of these updates
|| synchronously: you only have to make them in the right order. 
|| 
|| The sad thing is that the FFS approach is not just _wrong_, it's also
|| slower then the right way (the partial ordering will still allow quite a
|| lot of re-ordering among non-related updates, so you probably can get
|| reasonably close to the completely asynchronous case).  Linux doesn't do
|| it right either, but at least linux doesn't take the performance hit for
|| no gain. 
|| 
|| 		Linus


Linus hat sich dann  noch  öfter  in  Diskussionen  zu  diesem  Thema
beteiligt - das kocht alle paar Monate mal wieder hoch.


> > Vielleicht  solltest  Du  mal  in  den  Archiven
> > graben, das dürfte ganz interessant sein.
> 
> Wie gräbt man dabei am besten? Kann ich dem Listenverwalter irgendwie sagen,
> er soll mir die Mails in den letzten x Monaten schicken, in denen die Wörter
> ext2 und Linus vorkommen?

Das erinnert mich an die Antwort von Randal Schwartz auf  die  Frage,
wie man mit Unix-Tools das morgige Datum bestimmern kann:

	echo "What's tomorrow's date?" | mail root

Hmmmm...

Was hältst Du davon, die Internet-Suchmaschine Deiner Wahl  zu  bemü-
hen? Such' doch beispielsweise mal bei Google nach "+meta-data +linux
+update +Linus" ...

> > Übrigens hat sich in den aktuellen Kernels (2.3.48)  gerade  auch  in
> > Beziehung  auf  das  Buffering  massiv was geändert (was man deutlich
> > merkt, wenn man z. B. mal ein 4 GB großes File schreibt).
> 
> Buffert der neue Kernel mehr oder weniger?

Der Buffer-Bereich wächst nicht mehr über alle Grenzen, wenn man sehr
große Files schreibt. Unter 2.2.14 konnte man  das  System  praktisch
komplett  lahmlegen, wenn man sehr große Files geschrieben hat (dabei
kam es nur auf die Größe an, auch bei moderaten Datenraten von z.  B.
1...2 MB/s). Unter 2.3.4x passiert da nichts mehr.

> Ich meine einen Web- + Mailserver unter hoher Auslastung. Ohne Newsgroups.
> Durchschnittliche html/graphics-file-Größe.

Reiser-FS?

Wolfgang

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88  Email: wd@denx.de
Mr. Cole's Axiom:
        The sum of the intelligence on the planet is a constant;
        the population is growing.