Change: Replace Peewee with SQLAlchemy/Alembic by washort · Pull Request #1417 · getredash/redash

washort · 2016-11-21T15:40:11Z

arikfr

Exciting 😃

arikfr · 2016-11-21T15:44:45Z

redash/handlers/base.py

-        return fn(*args, **kwargs)
-    except DoesNotExist:
+    rv = fn(*args, **kwargs)
+    if rv is None:


SQLA will never raise an exception for missing rows?

its .get() and .first() methods return None when there's no entry, so I believe that's correct.

Maybe we can get rid of this helper, as it seems that Flask-SQLAlchemy has its own: first_or_404 and get_or_404.

SQLA will never raise an exception for missing rows?

See also .one() and .one_or_none()

arikfr · 2016-11-21T15:50:42Z

requirements.txt

 Flask-RESTful==0.3.5
 Flask-Login==0.3.2
 Flask-OAuthLib==0.9.2
+Flask-SQLAlchemy==2.1


In another project I used alchy which also has Flask-Alchy which is a drop-in replacement for it. The benefit is that we won't need Flask session whenever we're using the DB. The downside is that alchy never seemed to gain mind share unlike Flask-SQLAlchemy.

TL;DR: we can keep Flask-SQLA and see if it adds too much boilerplate to the jobs code. If it isn't, keep it. Otherwise consider alchy.

My suspicion is that it won't add boilerplate, calling create_app() should be most of it.

Re: alchy, it seems the author's mostly moved onto https://github.com/dgilland/sqlservice#history

arikfr · 2016-11-21T15:54:32Z

redash/models.py

+    data_source_id = Column(db.Integer, db.ForeignKey("data_sources.id"), nullable=True)
+    data_source = db.relationship(DataSource)
+    latest_query_data_id = Column(db.Integer, db.ForeignKey("query_results.id"), nullable=True)
+    latest_query_data = db.relationship(QueryResult)


Why do we need both latest_query_data_id and latest_query_data? (applies to all similar fields) SQLA doesn't have a convenience method to get the object id instead of loading the object itself otherwise?

SQLA is a bit more explicit about this stuff; the foo_id field is the actual db column, and foo is the attribute for the related ORM object.

jeffwidman

Thanks for working on this! I added a few comments, hopefully they're helpful... didn't have time to review it all.

jeffwidman · 2016-11-23T16:33:05Z

redash/cli/database.py

+    from redash.models import db, create_db, init_db
+    create_db(True, True)
    init_db()
+    db.session.commit()


Why do you need the db.session.commit()?

I assume create_db or init_db calls Flask-SQLAlchemy's create_all(), which will implicitly call a commit:
http://stackoverflow.com/questions/34410091/flask-sqlalchemy-how-can-i-call-db-create-all-and-db-drop-all-without-trigg

jeffwidman · 2016-11-23T16:37:42Z

redash/models.py

+class TimestampMixin(object):
+    updated_at = Column(db.DateTime(True), default=db.func.now(),
+                           onupdate=db.func.now(), nullable=False)
+    created_at = Column(db.DateTime(True), default=db.func.now(),


Maybe this should use server_default?

http://stackoverflow.com/a/33532154/770425

jeffwidman · 2016-11-23T16:40:10Z

redash/models.py

    @classmethod
    def get_by_id_and_org(cls, object_id, org):
-        return cls.get(cls.id == object_id, cls.org == org)
+        return cls.query.filter(cls.id == object_id, cls.org == org).first()


Will this query ever return more than one result? If not, should probably use one() or one_or_none()

jeffwidman · 2016-11-23T16:41:26Z

redash/models.py

    @classmethod
    def get_by_slug(cls, slug):
-        return cls.get(cls.slug == slug)
+        return cls.query.filter(cls.slug == slug).first()


I suspect this should also be a one() or one_or_none() as I think you want errors if more than one result is returned for a given slug.

jeffwidman · 2016-11-23T16:42:10Z

redash/models.py

+    name = Column(db.String(100))
+    permissions = Column(postgresql.ARRAY(db.String(255)),
+                         default=DEFAULT_PERMISSIONS)
+    created_at = Column(db.DateTime(True), default=db.func.now())


Probably want server_default()

Probably, but I don't want to change the schema until after I get things working as-is. (Hi Jeff! It's been a while! Never expected to see someone from SFSH commenting on my code :-)

Now I'm curios what's SFSH :-)

jeffwidman · 2016-11-23T16:45:14Z

Fixes #1124

arikfr · 2016-11-28T09:04:34Z

@washort I finished with my work on the frontend (for now) and want to give you a hand here. I rebased you branch with the latest master & fixed an issue with the settings/DATABASE_URL. Do you have some unpushed work or can I do a force push with these changes?

arikfr · 2016-11-28T12:37:40Z

I updated the tests code and now we get real failures, but still many tests fail just because the database runs out of connections. I tried to compare how we manage the connection/session in setUp/tearDown with other projects and couldn't spot an obvious issue. :-(

Calling engine#dispose (9f43542) seems to fix this, but is it the right usage? Why I haven't seen this in any other example?

arikfr · 2016-11-28T17:08:13Z

Another SQLA question:

    @classmethod
    def get_by_id_and_org(cls, visualization_id, org):
        return cls.query.join(Query).filter(cls.id == visualization_id, Query.org == org).one()

With peewee I could pass to such method either Organization object or just an Organization id. But in SQLA it seems that for Query.org I can only use Organization object and for Query.org_id I can use only integers.

Any middle ground?

washort · 2016-11-28T20:08:54Z

No, you have to match up the right value with the right attribute. This is only really an issue in unit tests though, I think, because as far as I can tell, in the rest of the code you should only be using object ids when they're in query parameters; the rest of the time you can just pass around objects.

arikfr · 2016-11-28T21:16:56Z

Working today on this branch was a reminder why I never liked SQLAlchemy in the first place :-\ It's very powerful, but why the simple stuff are so hard and verbose?

arikfr · 2016-11-29T10:12:56Z

@washort
When I created the getredash version of your branch, it somehow got messed up and included lots of unrelated commits. I tried rebasing, which resulted in a lot of wasted time and weird result. Eventually I just cherry-picked our commits on top of the latest master and it worked.

But I had to force push the result over your branch... I hope you didn't have anything uncommitted.

washort · 2016-12-01T04:17:04Z

Looks like we're experiencing test failures due to webpack not running in CircleCI.

arikfr · 2016-12-01T07:03:57Z

Replace GFKBase usage with bridge tables

I'm not 100% sure about this. How will it work? Any reference implementation/documentation?

arikfr · 2016-12-01T07:15:19Z

All tests pass now (I changed configuration to run Webpack) 💯

But I greped the code for things like update_instance and save and still found usage of it. Mainly in the CLI but also in places where we were too lazy to write the tests ;-)

washort · 2016-12-01T14:53:27Z

That's why I fixed the tests first -- to see where it'd be profitable to write more tests :-)

Re bridge tables - this is what I was looking at: http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/table_per_related.html

arikfr · 2016-12-04T08:25:11Z

Re bridge tables - this is what I was looking at: http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/table_per_related.html

This will work for AccessPermission, but won't work for ApiKey. For ApiKey we need to do the opposite lookup - given an api key, find the object associated with it.

arikfr · 2016-12-04T11:45:12Z

redash/models.py

+                           nullable=False)


 class ChangeTrackingMixin(object):


How about implementing this with a an before_update or after_update event?

It seems that SQLA has the tools to determine if something was changed in the event. I tried to experiment with it, but couldn't get the event to trigger. :\

I found out why the event didn't trigger and implemented it: e8739b3

If you have no comments, I will push this change to your branch.

The part I'm not happy about is how we deduce the user who changed the object, but I'm not sure it's that bad. Eventually this code is only relevant for the API, which is Flask based... and I added some safeguards to make sure it doesn't cause harm outside of Flask context.

Another issue with the way I set who changed the object is that it can't be changed :-\ Not a huge deal, mainly an issue in tests at the moment but feels wrong.

I'll have a look at this next. What was the issue with the way it's done now?

Mainly the fact that you need to call record_changes and the manual "calculation" of what changed.

But apparently doing it in the after_insert/after_update events is wrong (SQLA complains about using Session.add there) and using before_flush also introduces its own challenges.

If I won't find a solution for this today, I will revert back to your version, apply record_changes where needed and revisit this in the future.

Ah. Yeah I didn't want to try to get too magic at this point, might be interesting to investigate later.

I always try to maintain balance between "magic" and "hassle" :-) At first it seemed like a good balance point here, but as this starts to become too complex, I think I will revert to the explicit version you had.

Otherwise we were running out of connections.

arikfr · 2016-12-07T13:56:46Z

@washort I did some updates to the CLI tests:

Switch to using the "main" manager instance (the FlaskGroup) object, to ensure that the Flask app is set (there were tests failing because it was expecting the FLASK_APP env variable).
Added missing Session.commit calls in the CLI commands.
Switch to using flask.cli.AppGroup to create the subcommands so we don't have to wrap everything with with_app_context decorator.

There are still some (4?) tests failing because the CLI creates its own app_context/db session. I'm not sure how to solve this :-( One option is to create our own with_app_context decorator that will reuse an existing app context if one is available, but it feels wrong to change production code to accommodate tests...

Also moved old migrations to old_migrations folder (before deleting them entirely).

arikfr · 2016-12-07T16:37:05Z

Added Alembic (with Flask-Migrate). Updated the tasks to reflect integration status.

washort · 2016-12-07T18:44:19Z

redash/cli/data_sources.py

 from redash.utils.configuration import ConfigurationContainer

-manager = click.Group(help="Data sources management commands.")
+manager = AppGroup(help="Data sources management commands.")


Oh nice, I didn't see that.

arikfr · 2016-12-08T14:11:11Z

Aside from more fixes to broken functionality (if there is anything left) the only thing I want to add in this branch before merging is "Replace MeteredModel with SQLAlchemy timing events". All the rest can be a follow up IMO.

washort · 2016-12-09T18:52:34Z

Had a look at the query-timing stuff available in SQLAlchemy - the main difficulty with replicating the current behavior is that by the time queries are executed, the only information available is the query text and its parameters; model class, method name etc. aren't accessible.

~~Any thoughts on how you want to handle this? Obviously we can parse the SQL to retrieve table names and operation type, if that's the path you prefer.~~

Never mind, I was looking at after_cursor_execute, but after_execute gives access to the query before it's rendered to a string.

arikfr · 2016-12-09T19:37:38Z

Let's keep the functionality of overall timing per request and # of queries executed. Until now this is the only infrormation I actuallly used.

…

On Fri, 9 Dec 2016 at 20:52 Allen Short ***@***.***> wrote: Had a look at the query-timing stuff available in SQLAlchemy - the main difficulty with replicating the current behavior is that by the time queries are executed, the only information available is the query text and its parameters; model class, method name etc. aren't accessible. Any thoughts on how you want to handle this? Obviously we can parse the SQL to retrieve table names and operation type, if that's the path you prefer. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1417 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEXLHmXF6vHBUjYXykdGYgYyNoXGASCks5rGaNzgaJpZM4K4WHX> .

washort · 2016-12-10T00:23:27Z

In that case I think we're done. I still want to change the schema a bit but that can happen in a new branch.

arikfr · 2016-12-11T13:42:39Z

I've added some more metrics and .... it's merged! :)

jeffwidman · 2016-12-12T06:32:49Z

Major congrats you two, this was a lot of work!

PS: @washort indeed surprised to see you on here. Hope you and fam are well. This is kinda like the inverse of when a IRL coworker told me he was googling something and found what he needed on SO, then realized I'd written the answer.

@arikfr SFSH is a mailing list of a loosely affiliated group of folks, many (but not all) of whom attend gracepres.com.

arikfr reviewed Nov 21, 2016

View reviewed changes

washort mentioned this pull request Nov 21, 2016

full-text search for query text, name, description #1400

Closed

jeffwidman reviewed Nov 23, 2016

View reviewed changes

washort force-pushed the sqlalchemy branch from e263745 to 0737bda Compare November 23, 2016 18:37

arikfr force-pushed the sqlalchemy branch from b4e3387 to c5686da Compare November 29, 2016 10:10

arikfr mentioned this pull request Dec 1, 2016

properly handle view_only permission in groups API #1418

Closed

arikfr reviewed Dec 4, 2016

View reviewed changes

Allen Short and others added 11 commits December 7, 2016 02:13

schema for sqlalchemy, basic test support

24217d9

properly handle view_only permission in groups API

d2aef54

test_models passes

ea16666

auth tests wip

f00d77d

Make draft status for queries and dashboards toggleable.

982667f

Fix: fix database URL

f55b836

Close DB connection between tests.

b390cd2

Otherwise we were running out of connections.

Update all tests to use the same test_client

2bff12b

Use db.drop_all/create_all directly

55cb374

Fix: connections leaking during tests.

04447e0

Start fixing visualizations tests

2a52521

arikfr added 4 commits December 7, 2016 15:11

Add missing db.session.commit calls in CLI

2b33963

Switch to flask.cli.AppGroup instead of flask.cli.with_app_context

2d206ef

Upgrade setuptools to install mock

abf57e4

Fix users CLI tests

74e6ef5

arikfr added 2 commits December 7, 2016 17:59

Add Flask-Migrate to the project

70d5454

Also moved old migrations to old_migrations folder (before deleting them entirely).

Add migration for the is_draft column

923c463

washort commented Dec 7, 2016

View reviewed changes

Allen Short and others added 8 commits December 7, 2016 13:18

Fix test_cli tests

da31d98

fix basic query execution from UI

4ba399a

fix cleanup_query_results task

4945d0b

Add warning script to migrations folder

3cce4d0

Stamp database on first creation

12cbfe1

Keep same logging format in ALembic

106c743

Fix cases where we used User.groups instead of User.group_ids

c380596

Fix tests that used Query.all_queries

1d18109

Add shell_context_processor to inject models to shell.

81fb139

Measure query time with statsd

e524db0

washort force-pushed the sqlalchemy branch from 76bd880 to e524db0 Compare December 10, 2016 00:22

arikfr added 2 commits December 11, 2016 15:11

Use group ids instead of groups in Queries.search/recent

1978e07

Bring back support for total query time/queries count

b3bfc3b

arikfr changed the title ~~WIP: Replace Peewee with SQLAlchemy/Alembic~~ Change: Replace Peewee with SQLAlchemy/Alembic Dec 11, 2016

arikfr merged commit 19960ee into getredash:master Dec 11, 2016

Conversation

washort commented Nov 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arikfr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffwidman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffwidman commented Nov 23, 2016

Uh oh!

arikfr commented Nov 28, 2016

Uh oh!

arikfr commented Nov 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arikfr commented Nov 28, 2016

Uh oh!

washort commented Nov 28, 2016

Uh oh!

arikfr commented Nov 28, 2016

Uh oh!

arikfr commented Nov 29, 2016

Uh oh!

washort commented Dec 1, 2016

Uh oh!

arikfr commented Dec 1, 2016

Uh oh!

arikfr commented Dec 1, 2016

Uh oh!

washort commented Dec 1, 2016

Uh oh!

arikfr commented Dec 4, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

washort commented Nov 21, 2016 •

edited

Loading

arikfr commented Nov 28, 2016 •

edited

Loading

washort commented Dec 9, 2016 •

edited

Loading

jeffwidman commented Dec 12, 2016 •

edited

Loading