SQLAlchemy: Applying Filtering, Ordering and Grouping to Queries
In this chapter, you will write queries that allow you to filter, order, and count your data. All within the Pythonic framework provided by SQLAlchemy.
Filtering and Targeting Data
Where clauses
- Restrict data returned by a query based on Boolean conditions
- Compare a column against a value or another column
#census is a table name
stmt = select([census])
stmt = stmt.where(census.columns.state== ‘California’)
results = connection.execute(stmt).fetchall()
for result in results:
print(result.state, result.age)
- Expressions
- in_(), like(), between(), and_(), any_()
- available as method on a column
stmt = select([census])
stmt = stmt.where(census.columns.state.startswith(‘New’))
for result in connection.execute(stmt):
print(result.state, result.pop2000)
#Define a list of states for which we want results
states = [‘New York’, ‘California’, ‘Texas’]
stmt = select([census])
stmt = stmt.where(census.columns.state.in_(states))
for result in connection.execute(stmt):
print(result.state, result.pop2000)
→Texas 158372
Texas 156767
Texas 156778
Texas 156625
Texas 147729
Texas 144433
Texas 147865
You might have noticed that we didn’t use any of fetch methods to retrieve a ResultSet. If you are only interested in manipulating one record at a time, you can iterate over the ResultProxy directly.
2. Conjunctions
- allow us to have multiple criteria in a where clause
- and_(), or_(), not_()
from sqlalchemy import or_
stmt = select([census])
stmt = stmt.where(or_(census.columns.state == ‘California’,
census.columns.state == ‘New York’))
for result in connection.execute(stmt):
print(result.state, result.sex)
Connecting to a PostgreSQL database
#Import create_engine function
from sqlalchemy import create_engine
#Create an engine to the census PostgreSQL database hosted on the cloud via AWS; when connecting to a PostgreSQL database, many prefer to use the psycopg2 database driver as it supports practically all of PostgreSQL’s features efficiently and is the standard dialect for PostgreSQL in SQLAlchemy. The connection strings have the form “dialect+driver://username:password@host:port/database”
engine = create_engine(‘postgresql+psycopg2://student:datacamp@postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com:5432/census’)
print(engine.table_names())
→[‘census’, ‘state_fact’, ‘vrska’, ‘census1’, ‘data’, ‘data1’, ‘employees3’, ‘users’, ‘employees’, ‘employees_2’]
Ordering query results
stmt = select([census.columns.state])
stmt = stmt.order_by(census.columns.state)
results = connection.execute(stmt).fetchall()
print(results[:10])
→[(‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,)]
from sqlalchemy import desc
stmt = select([census.columns.state])
rev_stmt = stmt.order_by(desc(census.columns.state))
rev_results = connection.execute(rev_stmt).fetchall()
print(rev_results[:10])
→ [(‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,)]
Ordering by multiple columns
Each column in the .order_by() method is fully sorted from left to right. This means that the first column is completely sorted, and then within each matching group of values in the first column, it’s sorted by the next column in the .order_by() method.
stmt = select([census.columns.state, census.columns.age])
stmt = stmt.order_by(census.columns.state, desc(census.columns.age))
results = connection.execute(stmt).fetchall()
print(results[:20])
→ [(‘Alabama’, 85), (‘Alabama’, 85), (‘Alabama’, 84), (‘Alabama’, 84), (‘Alabama’, 83), (‘Alabama’, 83), (‘Alabama’, 82), (‘Alabama’, 82), (‘Alabama’, 81), (‘Alabama’, 81), (‘Alabama’, 80), (‘Alabama’, 80), (‘Alabama’, 79), (‘Alabama’, 79), (‘Alabama’, 78), (‘Alabama’, 78), (‘Alabama’, 77), (‘Alabama’, 77), (‘Alabama’, 76), (‘Alabama’, 76)]
Counting, summing, and grouping data
from sqlalchemy import func
stmt = select([func.sum(census.columns.pop2008)])
results = connection.execute(stmt).scalar()
print(results)
→302876613
stmt = select([census.columns.sex, census.columns.age, func.sum(census.columns.pop2008)])
stmt = stmt.group_by(census.columns.sex, census.columns.age)
results = connection.execute(stmt).fetchall()
print(results)
- SQLAlchemy auto generates “column names” for functions in the ResultSet
- The column names are often func_# such as count_1
stmt = select([census.columns.sex, func.sum(census.columns.pop2008).label(‘pop2008_sum’)])
stmt = stmt.group_by(census.columns.sex)
results = connection.execute(stmt).fetchall()
print(results[0].keys())
→[‘sex’, ‘pop2008_sum’]