S3NBD - S3 Network Block Device

Authors

Sound < >

Latest news

A very special thanks to Amazon's new pricing structure, this program will now cost you your entire fortune starting June. Their new pricing structure goes by $0.10/GB transfer uploaded, $0.01/1000 PUT requests. Assuming a block size of 4k, in order to transfer 1GB, we need to make a minimum of 262144 PUT requests. So do the math: 262144 * $0.01/1000 PUT = $2.62 + $0.10 = $2.72 compared to $0.20 you've paid beforehand. Way to go! Thanks Amazon! Think of what will happen if you use a 1k block size...

There are other major issues as well (see FAQs), and as a result, I don't think I'll be developing this program anymore. I think I can make good use of the code elsewhere though (maybe like some cheap commandline client that'll upload 1 file at a time, but wait... there are already projects like that).

Summary

S3NBD is a NBD server written in C using Amazon's S3 as a storage backend for Linux. Amazon S3 provides developers with an API to store and retrieve data using Amazon's distributed storage system accessible over the web. The main benefits of this service is that it provides cheap ($0.15/GB storage, $0.20/GB transfer) storage and is virtually infinite in size. This idea here is to implement an NBD server using S3 as a backend. This project initially started as S3-FUSE file system driver but due to the limitations of the S3 API storage procedures, it was decided that it is better to implement it as a block storage device.

Implementation

S3NBD acts as a server to serve read/write block operations to the NBD client, which provides a block device on /dev/nbd0. The server uses S3 like a large sparse disk, storing data objects in 4k chunks. Only block writes are recorded while writes of zeros are deleted. A user can use the provided block device like any other block device formatting the device with a filesystem of their choice (ext2, reiser, FAT, ...) and mounting it to a directory.

Benefits

The benefits of S3NBD is that you can have an infinately large block device that can be mounted and used as a remote backup storage. The block device can also made into a RAID as well as used as part of a LVM volume. Because it is a block device, S3NBD automatically take advantage of your OS's file caching schemes to minimize the network traffic between S3.

Caveats

Unfortunately, it seems that in the event of a network failure, the result is equivalent to having a sudden power failure. Data loss may result and an fsck on a disk over S3 may be costly in bandwidth. A possible idea is to add local journalling, but that doesn't guarantee file system consistancy. Running a journalled filesystem over S3 may also be costly in bandwidth.

Another problem is that deleting files in a mounted filesystem does not free storage. This is because deleting a file calls the unlink( ) operation which only removes the link count on the object allowing it to be overwritten in the future. One solution is to overwrite the file with zeroes but this seems pretty inconvenient.

Please see FAQs for more caveats.

TODO

FAQs

A special thanks to those who have emailed me about this project. Most have inquired about my reasons for dropping this project. I hope this FAQ below may answer some questions.

What is NBD? How does it work?

NBD stands for Network Block Device. It allows one to use a block device which is stored over the network. NBD consists of two parts: the NBD client and server. The NBD client creates a special block device /dev/nbd0 which communicates with the NBD server which stores the data file for the block device. The client can access this device liek a regular block device.

How is S3NBD related to NBD?

S3NBD is a NBD server using S3 as the backend storage. It works by running S3NBD on the local machine and having the NBD client on the same local machine connect to it. The S3NBD (server) processes NBD requests and forwards them to S3.

Isn't there a race condition with running NBD client and server on the same machine?

Yes. The race condition occurs when the kernel decides to flush the write cache of the block device the NBD server uses during a flush operation to the NBD server. Since S3NBD does not access any block devices (i.e. does not open any local files) and operates mainly through socket operations, it theoretically should not suffer this problem.

How come this project was dropped?

S3 was never designed to be a file system in the first place and trying to make it behave like one is a challenging task. Although making it behave like a block device is more fitting based on S3's object GET and PUT design, there are still many difficulties and shortcomings due to the distributed and networked nature of S3. Futhermore, the new pricing scheme enforced in June of 2007 dramatically increased the cost of using S3 in such manner.

How is the use of S3 as a block device unreliable?

Block devices are very sensistive to errors. This means that one read error can cause the entire system to be left in an unconsistant state and therefore become unusable. The network is not reliable by nature and as such, fault tolerance becomes a major issue with S3. The usual procedure to resolve a disk in an unconsistant state is to run fsck over the block device. However, fsck is very costly (in price and bandwidth) when used over a networked device.

Can journalling and caching improve the reliability?

Journalling and caching can improve the reliability as well as bandwidth efficiency. However, due to the race condition with NBD, it is not possible to implement journalling and caching which uses the local drive as storage. Therefore, the only option is to use a memory cache which is not as reliable. There is also the issue of data propagation which affects reliability.

Since using local files is a problem, we can resolve this by running S3NBD on a different machine than on the local one!

Sure, but S3NBD was designed with simplicity in mind and the prerequisite of having two computers seems to be a bit too much trouble for its worth.

What is the data propagation problem? How does it affect S3NBD?

The data propagation problem is associated with the distributed nature of S3. The problem is that if you send an object to S3, that object will take time to propagate to other servers. What this means is that if you request for that object right after you send it, there is no guarantee that you will recieve the same object back right away. You may end up with an older copy or an empty object. This affect S3NBD because block devices depend on this correctness in order to function properly. This can lead to minor corruption of the file system over time.

So are you working on solutions to overcome these problems?

Yes. A new solution is in the works, but it will not be a block device solution. I am currently working on a more suitable solution which would incorporate some of benefits of block devices while improving the usability aspect for filesystem access. There will be tradeoffs in benefits, but I think it'll be better than having corrrupted or dead filesystems.

Downloads

Please read all of the FAQs before attempting to use S3NBD. You acknowledge all the risks of data corruption should you dare to try use this program. This program still remains available on the web for educational purpose and should not be used for production.

Warning: This software may corrupt data. Use for production is NOT recommended. Download at your own risk!

git clone https://github.com/soundsrc/s3nbd.git

The last ever release of S3NBD is available as a Ubuntu package. This is the same as the SVN version and therefore should not be used for production.

Download: s3nbd-20070323.deb

Install

It is required to have nbd-client, libneon, libssl and libfuse. If you have building from source, use this command:

./configure --prefix=/usr && make && make install 

The Ubuntu DEB package can be installed by:

dpkg -i s3nbd-20070323.deb 

Configuration

Configuration files? No such thing. I don't like configuration files.

Usage

Make sure you have an Amazon S3 account. You will be prompted for your Amazon ID, secret key and bucket. First of all start the server as a regular user. Note that the server, being in development stage, does not daemonize. Use Cntl+C to kill the server.

s3nbd

Now, in a serparate terminal, use nbd-client to connect to the server we justed created. I use block size of 4096 because that is the blocksize that is optimial with the current implementation.

sudo -i nbd-client bs=4096 localhost 5353 /dev/nbd0

If this is a new filesystem, create a ext2 filesystem to the S3 store. Here we need to set the number of block manually because our block device reports a ridiculously high storage count. You can use resize2fs to later expand the disk.

mke2fs -b 4096 /dev/nbd0 <nblocks>

Now you may mount it like a filesystem.

mount /dev/nbd0 /your/mount/point

Advanced usage

A few things to know about.

Use standard --help for options.

s3nbd --help

You can use environmentable variables to set your Amazon keys and S3NBD won't prompt for them.

export AWS_ACCESS_ID=xxxxx

export AWS_SECRET_KEY=xxxxxx

export AWS_BUCKET=mybucket

Change port numbers from the default 5353 to 12345:

s3nbd -p 12345

Enable encryption (currently not that reliable):

s3nbd -e

If you need to clean out the data left over the S3NBD on your S3 account, simply delete the bucket and all objects in it. I recommend this tool to do so: http://s3sync.net/.