This guide provides step-by-step instructions to verify the batch feature extraction implementation.
.env file configured (or use defaults)# Start all services
docker compose up -d
# Wait for services to be healthy (30-60 seconds)
docker compose ps
# Check logs
docker compose logs -f backend | grep "batch feature"
Check that the batch extraction task is scheduled:
# Inspect Celery Beat schedule
docker compose exec backend celery -A src.main:celery_app inspect scheduled
# Expected output should include:
# - 'backend.tasks.batch_feature_extraction'
# - scheduled every 300 seconds (5 minutes)
curl -X GET http://localhost:8001/api/v1/admin/features/stats
# Expected response:
# {
# "total_recordings": 0,
# "recordings_with_features": 0,
# "recordings_without_features": 0,
# "total_measurements": 0,
# "coverage_percent": 0.0
# }
curl -X POST http://localhost:8001/api/v1/admin/features/batch-extract \
-H "Content-Type: application/json" \
-d '{"batch_size": 10, "max_batches": 2}'
# Expected response:
# {
# "task_id": "abc123...",
# "message": "Batch extraction started (batch_size=10, max_batches=2)",
# "status": "queued"
# }
curl -X POST http://localhost:8001/api/v1/admin/features/backfill-all
# Expected response:
# {
# "task_id": "def456...",
# "message": "Full backfill started - this may take hours",
# "status": "queued"
# }
Watch the backend logs for batch extraction activity:
docker compose logs -f backend | grep -E "(batch feature|Backfill)"
# Expected logs (every 5 minutes):
# Starting batch feature extraction (batch_size=50, max_batches=5)
# Found 0 recordings without features
# No recordings without features
To test with actual recordings:
# 1. Create a recording session
curl -X POST http://localhost:8001/api/v1/acquisition/acquire \
-H "Content-Type: application/json" \
-d '{
"frequency_mhz": 145.5,
"duration_seconds": 10,
"start_time": "'$(date -u +%Y-%m-%dT%H:%M:%S)'"
}'
# 2. Wait for acquisition to complete (~70 seconds)
# 3. Check stats again
curl http://localhost:8001/api/v1/admin/features/stats
# Should show:
# - total_recordings: 1
# - recordings_without_features: 1 (if extraction hasn't run yet)
# 4. Wait 5 minutes for automatic batch extraction
# OR trigger manually:
curl -X POST http://localhost:8001/api/v1/admin/features/batch-extract
# 5. Check stats again after extraction completes
# Should show:
# - recordings_with_features: 1
# - coverage_percent: 100.0
Connect to PostgreSQL and verify the LEFT JOIN query:
docker compose exec postgres psql -U heimdall_user -d heimdall
# Run the query manually:
SELECT
rs.id as session_id,
rs.created_at,
rs.status,
COUNT(m.id) as num_measurements,
mf.recording_session_id as has_features
FROM heimdall.recording_sessions rs
LEFT JOIN heimdall.measurements m
ON m.created_at >= rs.session_start
AND (rs.session_end IS NULL OR m.created_at <= rs.session_end)
AND m.iq_data_location IS NOT NULL
LEFT JOIN heimdall.measurement_features mf
ON mf.recording_session_id = rs.id
WHERE rs.status = 'completed'
GROUP BY rs.id, rs.created_at, rs.status, mf.recording_session_id
ORDER BY rs.created_at DESC
LIMIT 10;
# Run unit tests
docker compose exec backend pytest tests/unit/test_batch_feature_extraction.py -v
# Run integration tests
docker compose exec backend pytest tests/integration/test_admin_endpoints.py -v
# Expected: All tests pass
✅ Celery Beat schedule includes batch-feature-extraction task (300s interval)
✅ Admin endpoints respond correctly to all requests
✅ Coverage statistics endpoint returns valid data
✅ Manual batch extraction queues tasks successfully
✅ Automatic batch extraction runs every 5 minutes
✅ LEFT JOIN query correctly identifies recordings without features
✅ Feature extraction tasks are queued and processed
✅ All tests pass
Solution: Check Celery Beat logs:
docker compose logs backend | grep -i "beat"
Solution: Check database connection and pool initialization:
docker compose logs backend | grep -E "(pool|database)"
Solution: Verify recordings exist:
docker compose exec postgres psql -U heimdall_user -d heimdall \
-c "SELECT COUNT(*) FROM heimdall.recording_sessions WHERE status = 'completed';"
Solution: Check RabbitMQ connectivity:
docker compose logs backend | grep -i "rabbitmq\|celery"
The query finds recordings without features by:
recording_sessions with measurements (recordings)measurement_features (extractions)mf.recording_session_id IS NULL (no features)The beat schedule in main.py defines:
"batch-feature-extraction": {
"task": "backend.tasks.batch_feature_extraction",
"schedule": 300.0, # Every 5 minutes
"kwargs": {
"batch_size": 50,
"max_batches": 5
}
}
This ensures: