Filesystem sprint
Purpose
Methods for accessing the file system is currently spread "all over the place" (see /WhatSucks ) - this makes it very difficult to override behaviour. It is useful to be able to override behaviour, for instance mocking up testing code, making virtual file systems, etc.
We would like a unified filesystem library, which ultimately should be included in the standard python library. Getting file stats and attributes, opening and reading directories, checking permissions etc is within the scope. We don't think we would be able to add significant value to the file object, so accessing file content is out of the scope.
Our first work will focus on making an intuitive and easy-to-use library for accessing the local file system - but it's important to keep in mind that the API should be usable towards anything that looks like a filesystem. This includes remote file systems, git, archive files, ISO images, etc.
Participants and planning of the sprint
Tommi Virtanen <tv IS AT eagain.net> Tv
Tobias Brox <tobixen@gmail.com> tobixen on IRC, will be in Vilnius until friday around 14:00. I'm planning to be ready around 10:00 thursday. Call me at 912 or +47-91700050 if I'm still asleep when you want to start.
MikePittaro <mikeyp AT snaplogic.org>
StefanSchwarzer <sschwarzer AT sschwarzer.net>
IRC Channel: #fssprint
Meet up thursday when and where?
We should try to be a bit diciplined, discuss how to do things without wasting too much time on details.
Suggestion for agenda:
- Decide on VCS and testing framework. I have no strong preferences, I suppose we'll use git and nose since that's what Tommi is most comfortable with?
- Set up VCS and test framework
- Discuss the overall design, and agree on something without wasting too much time on details
- Make a complete list of all functionality the library should provide on this wiki page.
- Start writing test code
- Review and discuss test code
- Make the test codes pass
- Review and discuss code
- Iterate
Other libraries
Participants for the sprint are strongly advised to look through path.py, twisted.python.filepath, twisted.vfs and relevant functions in path.* and os.*. In addition there is bzrlib.transport, shutil and ftputil - http://ftputil.sschwarzer.net/trac.
Features
- Wrapping of relevant existing functions, builtins, etc.
- Some permission checking, like the library should be able to disallow access to parent paths and symlinks pointing out of a file system.
- Possibility to "mount" another file system object into an existing file system object, and to make a union between two file systems.
Design
Some entities:
- File System
- Path (A path can be a file or a directory or a block special, etc)
- Directory (I think a directory should either be a subclass of a file system, or contain the same methods as a file system)
- File (but we'd rather use existing objects than reinventing our own)
- inode?
Some functionality that should be covered:
- Opening a directory, allowing ...
iterations (__iter__)
"in" operator (__contains__)
access as dict (__getitem__)
Other Notes
Slides of the original talk: http://eagain.net/talks/pythonic-fs/
Random note dump from the open space session:
import fs f = fs.open('foo')
MyThingie(fs=MockFS())
http = HTTPClientFS(
- credentials={
- 'example.com': dict(user='foo', pass='bar')
ftp = FTPClientFS()
multiplex = MultiplexFS(
- http=http, ftp=ftp, )
/http/fgdfgfdgfd
<scheme> ":"
file() vs. open()
openat(fd, filename)
orig = os.getcwd() os.chdir(tmp) ... os.chdir(orig)
f = os.open('.') os.chdir(tmp) ... os.fchdir(f)
p.child() protects you against "GET /../../../etc/passwd HTTP/1.0"
if you want to support "..", do p.join(trusted_path) (can also do multiple segments)
trap: os.path.join('foo', '/bar') => '/bar'
pay cost only when mandatory: fs.path('foo', 'bar').isfile()
IOError vs OSError; can they be merged?
class EExistsOSError(OSError):
- pass
would rather see
class ExistsError(FilesystemError):
- pass
or not? base class needed/wanted?
def mkdir(*a, **kw):
- try:
- os.mkdir(*a, **kw)
- pass
if filename in directory
for filename in directory: vs for filename in directory.listdir():
->
class path(object):
def iter(self):
- # will raise OSError with ENOTDIR if not a dir return os.listdir(self._path)
fs_root.child('foo')
classes:
- path = filesystem path, might point to file or dir, or not exist do we need separate directory etc classes? e.g. readlink is only valid for symlinks; just have class path that has method readlink, and raise if it happens to not be a symlink? or p.as_symlink().readlink()? does just putting methods in class path generalize to all of them

