Pagination#
FOLIO APIs return paginated results for large datasets. FolioClient provides intelligent pagination handling that optimizes performance based on the query type and data characteristics.
Understanding FOLIO Pagination#
FOLIO uses two pagination strategies:
Offset-based pagination: Traditional page-based approach
ID-based pagination: More efficient for large result sets
FolioClient automatically chooses the best strategy based on your query.
Basic Pagination#
Manual Pagination#
For fine-grained control, use manual pagination:
from folioclient import FolioClient
client = FolioClient(...)
# Get first page
users = client.folio_get("/users", "users", limit=10, offset=0)
print(f"Page 1: {len(users)} users")
# Get second page
users = client.folio_get("/users", "users", limit=10, offset=10)
print(f"Page 2: {len(users)} users")
# Continue until no more results
offset = 0
limit = 10
while True:
users = client.folio_get("/users", "users", limit=limit, offset=offset)
if not users:
break
print(f"Processing {len(users)} users at offset {offset}")
for user in users:
process_user(user)
offset += len(users)
Automatic Pagination#
Use folio_get_all() for automatic, efficient pagination over large datasets:
# Process all users automatically
for user in client.folio_get_all("/users", "users"):
print(f"Processing: {user['username']}")
# With query filtering
for active_user in client.folio_get_all("/users", "users", query="active==true"):
print(f"Active user: {active_user['username']}")
Attention
Endpoints that do not support paging using offset and limit are not currently supported by folio_get_all.
Pagination Strategies#
Offset-Based Pagination#
Traditional pagination using offset and limit:
# Manual offset-based pagination
limit = 100
offset = 0
all_users = []
while limit == 100:
batch = client.folio_get("/users", "users", limit=limit, offset=offset)
all_users.extend(batch)
offset += len(batch)
limit = len(batch)
print(f"Loaded {len(all_users)} total users")
Attention
Offset-based pagination is fine for querying most record types in FOLIO. However, if you are going to query a large dataset (eg. Instances, Holdings, or Items), you will experience increasing slowdowns and resource contention with each offset/page of records retrieved. For such large sets, using id-based pagination will provide more reliable performance.
ID-Based Pagination#
More efficient for large datasets when sorted by ID:
# FolioClient automatically uses ID-based pagination when appropriate
# This happens when your query is sorted by 'id'
query = "active==true sortBy id"
for user in client.folio_get_all("/users", "users", query=query):
print(f"User ID: {user['id']}")
The client detects ID-sorted queries and automatically switches to ID-based pagination for better performance.
Advanced Pagination Options#
Custom Batch Sizes#
Control pagination batch size for optimal performance:
# Small batches for memory-constrained environments
for user in client.folio_get_all("/users", "users", limit=10):
process_user(user)
# Large batches for better throughput
for user in client.folio_get_all("/users", "users", limit=1000):
process_user(user)
# Default batch size is 10, which works well for most cases
Query-Specific Pagination#
Different queries may benefit from different approaches:
# For ID-sorted queries, FolioClient uses optimized ID-based pagination
id_sorted_query = "metadata.createdDate>2024-01-01 sortBy id"
for user in client.folio_get_all("/users", "users", query=id_sorted_query):
print(f"User created: {user['metadata']['createdDate']}")
# For other sorts, it uses offset-based pagination
name_sorted_query = "active==true sortBy personal.lastName"
for user in client.folio_get_all("/users", "users", query=name_sorted_query):
print(f"User: {user['personal']['lastName']}")
Async Pagination#
There is an async versions of folio_get_all: folio_get_all_async:
import asyncio
async def process_all_users_async():
async with FolioClient(...) as client:
try:
async for user in client.folio_get_all_async("/users", "users"):
print(f"Processing: {user['username']}")
await process_user_async(user)
except (FolioConnectionError, FolioHTTPError) as e:
print(
f"Error retrieving {e.request.url}", getattr(getattr(e, "response"), "text", e)
)
asyncio.run(process_all_users_async())
Performance Optimization#
Choosing Batch Size#
Optimal batch size depends on several factors:
# Small records, network is fast -> larger batches
for item in client.folio_get_all("/items", "items", limit=1000):
process_small_item(item)
# Large records, limited memory -> smaller batches
for user in client.folio_get_all("/users", "users", limit=25):
process_large_user_record(user)
# For most use cases, the default (100) works well
for record in client.folio_get_all("/endpoint", "key"):
process_record(record)
Monitoring Progress#
Track pagination progress for long-running operations:
import time
def process_with_progress():
client = FolioClient(...)
start_time = time.time()
processed = 0
for user in client.folio_get_all("/users", "users", limit=100):
process_user(user)
processed += 1
# Progress report every 1000 records
if processed % 1000 == 0:
elapsed = time.time() - start_time
rate = processed / elapsed
print(f"Processed {processed} users ({rate:.1f} users/sec)")
Error Handling in Pagination#
Handle errors gracefully during pagination:
import httpx
from folioclient.exceptions import FolioClientException
def robust_pagination():
client = FolioClient(...)
processed = 0
errors = 0
try:
for user in client.folio_get_all("/users", "users"):
try:
process_user(user)
processed += 1
except Exception as e:
print(f"Error processing user {user.get('id', 'unknown')}: {e}")
errors += 1
except httpx.HTTPStatusError as e:
print(f"HTTP error during pagination: {e.response.status_code}")
except FolioClientException as e:
print(f"FolioClient error: {e}")
print(f"Processed: {processed}, Errors: {errors}")
Working with Filtered Results#
Pagination with CQL Queries#
# Filter and paginate efficiently
query = "active==true and personal.lastName=Smith*"
for user in client.folio_get_all("/users", "users", query=query):
print(f"Active Smith: {user['personal']['lastName']}")
# Complex queries with sorting for optimal pagination
complex_query = """
active==true and
metadata.createdDate>2024-01-01
sortBy id
"""
for user in client.folio_get_all("/users", "users", query=complex_query):
print(f"Recent user: {user['username']}")
Counting Results#
Get total count before processing:
# Get total count first
total_count = client.folio_get("/users", "totalRecords", limit=0)
print(f"Total users: {total_count}")
Note
limit=0 is the idiomatic way to get an accurate record count from most modules in FOLIO. Some Spring-based modules may throw an error with limit=0, so you will need to experiment
Best Practices#
Use appropriate batch sizes - Start with default (100) and adjust based on performance
Implement progress monitoring for long-running operations
Handle errors gracefully - Don’t let one bad record stop the entire process
Consider memory usage - Use streaming for very large datasets
Use async for concurrency - When processing multiple datasets