Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 16

Displaying posts with tag: linux-kernel (reset)

ZFS: could have been the future of UNIX Filesystems
+5 Vote Up -0Vote Down

There was a point a few years ago where Sun could have had the next generation UNIX filesystem. It was in Solaris (and people were excited), there was a port to MacOS X (that was quite exciting for people) and there was a couple of ways to run it on linux (and people were excited). So… instead of the fractured landscape of ext3, HFS+ and (the various variations of) UFS we could have had one file system that was common between all of the commonly used UNIX-like variants. Think of being able to use a file system on a removable drive that isn’t FAT and being able to take it from machine to machine (well… Windows would be a problem, but it always is).

There was some really great work done in OpenSolaris with integration between the

  [Read more...]
Does linux fallocate() zero-fill?
+0 Vote Up -0Vote Down

In an email disscussion for pre-allocating binlogs for MySQL (something we’ll likely have to do for Drizzle and replication), Yoshinori brought up the excellent point of that in some situations you don’t want to be doing zero-fill as getting up and running quickly is the most important thing.

So what does Linux do? Does it zero-fill, or behave sensibly and pre-allocate quickly?

Let’s look at hte kernel:

Inside the fallocate implementation (fs/open.c):

if (inode->i_op->fallocate)
ret = inode->i_op->fallocate(inode, mode, offset, len);
else
ret = -EOPNOTSUPP;

and for ext4:
/*
* currently supporting (pre)allocate mode for extent-based
* files _only_
*/
if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
return -EOPNOTSUPP;

XFS has always done










  [Read more...]
default filesystem and disk parameters are for wusses
+0 Vote Up -0Vote Down

I can’t remember the last time i used default mkfs or mount options… oh yeah, that’s right - by accident.

Anyway… I did a little experiment today.

The filesystem is my laptop /home - XFS, 100GB, 95% used (so 5-6GB free), rather aged. This is where a lot of my MySQL development is done. Mkfs options: 128MB log, version2 log. Mount options: logbufs=8, logbsize=256k. All of this geared towards increasing metadata performance.

Why metadata performance? well… source code trees are a lot of metadata :)

So, let’s try some things: cloning a repository and then removing the repository.

Two variables are being tested: mounting the file system with nobarrier (or barrier, the default). Write barriers tell the disk to ensure write order to the platter when write cache is in use. Also testing

  [Read more...]
gah? O_DIRECT?. 2.4?. non xfs? stab stab
+0 Vote Up -0Vote Down

* dchinner hands MacPlusG3 a bigger knife….

(on #xfs yesterday)

CREATE, INSERT, SELECT, DROP benchmark
+0 Vote Up -0Vote Down

Inspired by PeterZ’s Opening Tables scalability post, I decided to try a little benchmark. This benchmark involved the following:

  • Create 50,000 tables
  • CREATE TABLE t{$i} (i int primary key)
  • Insert one row into each table
  • select * from each table
  • drop each table
  • I wanted to test file system impact on this benchmark. So, I created a new LVM volume, 10GB in size. I extracted a ‘make bin-dist’ of a recent MySQL 5.1 tree, did a “mysql-test-run.pl –start-and-exit” and ran my script, timing real time with time.

    For a default ext3 file system creating MyISAM tables, the test took 15min 8sec.

    For a default xfs file sytem creating MyISAM tables, the test took 7min

      [Read more...]
    Disk allocation, XFS, NDB Disk Data and more?
    +0 Vote Up -0Vote Down

    I’ve talked about disk space allocation previously, mainly revolving around XFS (namely because it’s what I use, a sensible choice for large file systems and large files and has a nice suite of tools for digging into what’s going on).Most people write software that just calls write(2) (or libc things like fwrite or fprintf) to do file IO - including space allocation. Probably 99% of file io is fine to do like this and the allocators for your file system get it mostly right (some more right than others). Remember, disk seeks are really really expensive so the less you have to do, the better (i.e. fragmentation==bad).

    I recently (finally) wrote my patch to use the xfsctl to get better allocation for NDB disk data files (datafiles and undofiles).
    patch at:


      [Read more...]
    Arjen?s MySQL Community Journal - HyperThreading? Not on a MySQL server?
    +0 Vote Up -0Vote Down

    Arjen’s MySQL Community Journal - HyperThreading? Not on a MySQL server…

    I blame the Linux Process Scheduler. At least it’s better than the earlier 2.6 days where things would get shunted a lot from one “cpu” to the other “cpu” for no real reason.

    Newer kernel verisons are probably better… but don’t even think of HT and pre-2.6 - that would be funny.

    DaveM on Ingo?s SMP lock validator
    +0 Vote Up -0Vote Down

    DaveM talks about Ingo’s new SMP lock validator for linux kernel

    A note reminding me to go take a look and see what can be ripped out and placed into various bits of MySQL and NDB. Ideally, of course, it could be turned into a LD_PRELOAD for pthread mutexes.

    Anybody who wants to look deeper into it before I wake up again is welcome to (and tell me what they find)

    Beat on ?state of the dolphin? (or: Why Software is never really ready until a .20 release)
    +0 Vote Up -0Vote Down

    Beat Vontobel blogs about “fuþark: The silence of futhark and the state of the dolphin” which is basically about how he’s found that the 5.0.20 release of MySQL (http://www.mysql.com) is when the 5.0 release is really starting to shine.

    This confirms my theory (that I’ve had for quite a while now… like years) that a software release is never really mature until it hits about .20 (that’s dot twenty, not dot two).

    When something reaches .10 (dot ten) it’s no longer going to be annoying for most uses, but .20 means that you’re going to be happy. Don’t ask me really why this is the case, but it is.

    Think about the 2.6 kernel (yes,

      [Read more...]
    really unstable laptop
    +0 Vote Up -0Vote Down

    I’m currently getting hard crashes about five times a day.

    I thought it was the sound driver, as i got a crash during dist-upgrade (again) while on console and saw the backtrace. Basically looked like something bad happenned when the sound was muted.

    So, running without sound muted - just turned down.

    Well, today, just crashed again. Since running X, no backtrace. ARRRGHHH.

    Also crashed when waking up too. ACPI stuff in the backtrace.

    Not a happy camper at the moment. I have work to do, not futzing around with trying to find out what the fuck is wrong with my laptop (probably software) when I should be running a stable system.

    I’ve already have to re-add all my liferea RSS feeds as liferea obviously isn’t doing the right thing (at least the version shipping with Ubuntu) regards writing the feeds file to disk.

    So,

      [Read more...]
    Microsoft?s file system patent upheld: ZDNet Australia: News: Software
    +0 Vote Up -0Vote Down

    Microsoft’s file system patent upheld: ZDNet Australia: News: Software

    Saying any part of the FAT file system is “novel and non-obvious” is rather like saying being stabbed in the eye with a fork is “novel and a good way to spend a sunday afternoon”.

    Seriously - what the?

    I’m really glad I work for a company that opposes software patents.

    Thanks to Pia for the links.

    disk space allocation (part 4: allocating an extent)
    +0 Vote Up -0Vote Down

    For XFS, in normal operation, an extent is only allocated when data has to be written to disk. This is called delayed allocation. If we are extending a file by 50MB - that space is deducted from the total free space on the filesystem, but no decision on where to place that data is made until we start writing it out - due to memory pressure or the kernel automatically starts writing the dirty pages out (the sync once every 5 seconds on linux).

    When an extent needs to be allocated, XFS looks it up in one of two b+trees it has of free space. There is one sorted by starting block number (so you can search for “an extent near here”) and one by size (so you can search for “an extent of x size”).

    The ideal situation being that you want as large an extent as possible as close to the tail end of the file as possible (i.e. just

      [Read more...]
    disk space allocation (part 3: storing extents on disk)
    +0 Vote Up -0Vote Down

    Here I’m going to talk about how file systems store what part of the disk a part of the file occupies. If your database files are very fragmented, performance will suffer. How much depends on a number of things however.

    XFS can store some extents directly in the inode (see xfs_dinode.h). If I’m reading things correctly, this can be 2 extents per fork (data fork and attribute fork). If more than this number of extents are needed, a btree is used instead.

    HFS/HFS+ can store up to 8 extents directly in the catalog file entry (see Apple TechNote 1150 - which was updated in March 2004 with information on the journal format). If the file has more than 8 extents, a lookup then needs to be done into the extents overflow file. Interestingly enough, in

      [Read more...]
    disk space allocation (part 2: examining your database files)
    +0 Vote Up -0Vote Down
    memberdb/log.MYD:
     EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
       0: [0..943]:        5898248..5899191  3 (36536..37479)     944
       1: [944..1023]:     6071640..6071719  3 (209928..210007)    80
       2: [1024..1127]:    6093664..6093767  3 (231952..232055)   104
       3: [1128..1279]:    6074800..6074951  3 (213088..213239)   152
       4: [1280..1407]:    6074672..6074799  3 (212960..213087)   128
       5: [1408..1423]:    6074264..6074279  3 (212552..212567)    16
    memberdb/log.MYI:
     EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
       0: [0..7]:          10165832..10165839  5 (396312..396319)     8
    

    The interesting thing about this is that the log table grows very slowly. This table stores a bunch of debugging output for my memberdb applicaiton. It should possibly be a partitioned ARCHIVE table (and

      [Read more...]
    disk space allocation (part 1: seeing what?s happenned)
    +0 Vote Up -0Vote Down

    (a little while ago I was writing a really long entry on everything possible. I realised that this would be a long read for people and that less people would look at it, so I’ve split it up).

    This sprung out of doing work on the NDB (http://www.mysql.com/cluster) disk data tree. Anything where efficient use of the filesystem is concerned tickles my fancy, so I went to have a look at what was going on.

    Filesystems store what part of the disk belongs to what file in one of two ways. The first is to keep a list of every disk block (typically 4kb) that’s being used by the file. A 400kb file will have 100 block numbers. The second way is to store a range (extent). That is, a 400kb file could use 100 blocks starting at disk block number 1000.

    XFS has a tool called xfs_bmap. It

      [Read more...]
    LKML: Linus Torvalds: Re: [OT]Linus trademarks Linux?!!
    +0 Vote Up -0Vote Down

    LKML: Linus Torvalds: Re: [OT]Linus trademarks Linux?!!

    thoughts on the trademark and notes about slashdot being a big public wanking session (which is, if nothing else - quite accurrate and quite funny)

    An old year-2000 mail about the same stuff

    Showing entries 1 to 16

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.