SQLAlchemy: Applying Filtering, Ordering and Grouping to Queries

Yixuan Zhou
3 min readSep 10, 2020

In this chapter, you will write queries that allow you to filter, order, and count your data. All within the Pythonic framework provided by SQLAlchemy.

Filtering and Targeting Data

Where clauses

  • Restrict data returned by a query based on Boolean conditions
  • Compare a column against a value or another column

#census is a table name

stmt = select([census])

stmt = stmt.where(census.columns.state== ‘California’)

results = connection.execute(stmt).fetchall()

for result in results:

print(result.state, result.age)

  1. Expressions
  • in_(), like(), between(), and_(), any_()
  • available as method on a column

stmt = select([census])

stmt = stmt.where(census.columns.state.startswith(‘New’))

for result in connection.execute(stmt):

print(result.state, result.pop2000)

#Define a list of states for which we want results

states = [‘New York’, ‘California’, ‘Texas’]

stmt = select([census])

stmt = stmt.where(census.columns.state.in_(states))

for result in connection.execute(stmt):

print(result.state, result.pop2000)

→Texas 158372
Texas 156767
Texas 156778
Texas 156625
Texas 147729
Texas 144433
Texas 147865

You might have noticed that we didn’t use any of fetch methods to retrieve a ResultSet. If you are only interested in manipulating one record at a time, you can iterate over the ResultProxy directly.

2. Conjunctions

  • allow us to have multiple criteria in a where clause
  • and_(), or_(), not_()

from sqlalchemy import or_

stmt = select([census])

stmt = stmt.where(or_(census.columns.state == ‘California’,

census.columns.state == ‘New York’))

for result in connection.execute(stmt):

print(result.state, result.sex)

Connecting to a PostgreSQL database

#Import create_engine function

from sqlalchemy import create_engine

#Create an engine to the census PostgreSQL database hosted on the cloud via AWS; when connecting to a PostgreSQL database, many prefer to use the psycopg2 database driver as it supports practically all of PostgreSQL’s features efficiently and is the standard dialect for PostgreSQL in SQLAlchemy. The connection strings have the form “dialect+driver://username:password@host:port/database”

engine = create_engine(‘postgresql+psycopg2://student:datacamp@postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com:5432/census’)

print(engine.table_names())

→[‘census’, ‘state_fact’, ‘vrska’, ‘census1’, ‘data’, ‘data1’, ‘employees3’, ‘users’, ‘employees’, ‘employees_2’]

Ordering query results

stmt = select([census.columns.state])

stmt = stmt.order_by(census.columns.state)

results = connection.execute(stmt).fetchall()

print(results[:10])

→[(‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,), (‘Alabama’,)]

from sqlalchemy import desc

stmt = select([census.columns.state])

rev_stmt = stmt.order_by(desc(census.columns.state))

rev_results = connection.execute(rev_stmt).fetchall()

print(rev_results[:10])

→ [(‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,), (‘Wyoming’,)]

Ordering by multiple columns

Each column in the .order_by() method is fully sorted from left to right. This means that the first column is completely sorted, and then within each matching group of values in the first column, it’s sorted by the next column in the .order_by() method.

stmt = select([census.columns.state, census.columns.age])

stmt = stmt.order_by(census.columns.state, desc(census.columns.age))

results = connection.execute(stmt).fetchall()

print(results[:20])

→ [(‘Alabama’, 85), (‘Alabama’, 85), (‘Alabama’, 84), (‘Alabama’, 84), (‘Alabama’, 83), (‘Alabama’, 83), (‘Alabama’, 82), (‘Alabama’, 82), (‘Alabama’, 81), (‘Alabama’, 81), (‘Alabama’, 80), (‘Alabama’, 80), (‘Alabama’, 79), (‘Alabama’, 79), (‘Alabama’, 78), (‘Alabama’, 78), (‘Alabama’, 77), (‘Alabama’, 77), (‘Alabama’, 76), (‘Alabama’, 76)]

Counting, summing, and grouping data

from sqlalchemy import func

stmt = select([func.sum(census.columns.pop2008)])

results = connection.execute(stmt).scalar()

print(results)

→302876613

stmt = select([census.columns.sex, census.columns.age, func.sum(census.columns.pop2008)])

stmt = stmt.group_by(census.columns.sex, census.columns.age)

results = connection.execute(stmt).fetchall()

print(results)

  • SQLAlchemy auto generates “column names” for functions in the ResultSet
  • The column names are often func_# such as count_1

stmt = select([census.columns.sex, func.sum(census.columns.pop2008).label(‘pop2008_sum’)])

stmt = stmt.group_by(census.columns.sex)

results = connection.execute(stmt).fetchall()

print(results[0].keys())

→[‘sex’, ‘pop2008_sum’]

--

--