Skip to content

WIP/ENH: Make S3File iterable, readline #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

TomAugspurger
Copy link
Contributor

Adds a readline method to S3File, which is used to make them
iterable via py2 style .next and py3 style __next__ methods.

Adds a `readline` method to S3File, which is used to make them
iterable via py2 style `.next` and py3 style `__next__` methods.
@TomAugspurger TomAugspurger changed the title ENH: Make S3File iterable, readline WIP/ENH: Make S3File iterable, readline Mar 27, 2016
expected = csv_files['2014-01-01.csv'].split(b'\n')[0] + b'\n'
with s3.open(test_bucket_name + '/2014-01-01.csv') as f:
result = next(f)
assert result == expected
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to try __iter__ as well. Something like the following:

assert list(f) == csv_files['2014-01-01.csv'].split(b'\n')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I pushed a second commit after you made this note testing something like that a27d291

@TomAugspurger
Copy link
Contributor Author

d5ab2df should take care of the blocksize issue.

@TomAugspurger
Copy link
Contributor Author

The last commit added some file-like properties readable, writeable, and seekable. These were enough to pass most of the pandas test-suite (which doesn't have too many s3 tests admittedly). I was briefly looking into making S3File a proper subclass of one of the AbstractBaseClasses in io, but this seems to be enough for now.

There are a couple more pandas issues I'm working around.

  • We had some tests for s3n and s3a URLs. I'm not sure whether pandas was properly handling those in the past, we probably just treated them like regular s3 urls.
  • Maybe something pandas dealt with here. Basically a public bucket that has a public objects sitting next to private objects. When we go to list the objects in the directory, we get a permissions exception.

Will followup tonight hopefully. I can keep throwing changes in the PR if you want, or followup with another.

@mrocklin
Copy link
Collaborator

Whatever makes you happier. I'm really quite glad that you in particular
are handling this.

On Mon, Mar 28, 2016 at 5:38 AM, Tom Augspurger [email protected]
wrote:

The last commit added some file-like properties readable, writeable, and
seekable. These were enough to pass most of the pandas test-suite (which
doesn't have too many s3 tests admittedly). I was briefly looking into
making S3File a proper subclass of one of the AbstractBaseClasses in io,
but this seems to be enough for now.

There are a couple more pandas issues I'm working around.

Will followup tonight hopefully. I can keep throwing changes in the PR if
you want, or followup with another.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#18 (comment)

martindurant pushed a commit that referenced this pull request Mar 28, 2016
Special atention to rerely-used length parameter
@martindurant
Copy link
Member

Please compare with my version in PR #19.
The readline function there is somewhat simpler, and I believe copes with cases where a seek precedes a readline.
Interesting to see the requirements for readable etc.

@TomAugspurger
Copy link
Contributor Author

The need for readable, seekable, etc. are probably peculiar to pandas. We've got the C parser, and a fallback parser written in Python. For the Python parser, I needed to wrap the byte stream from here in a TextIOWrapper, which required those methods to be present.

Close this one in favor of #19?

@martindurant
Copy link
Member

If you're happy with #19, then yes.

martindurant added a commit that referenced this pull request Mar 28, 2016
Readline support with pandas additions from PR #18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants