Acoustic Event Detection for Emergency Telephony

Streaming audio event-detection with custom PyTorch model. Nine acoustic classes, log-mel extraction, VAD, caption generation (SRT/XML/JSON), Dockerized deployment.

Technologies Used #

Deep Learning:

PyTorch for model development
Custom neural network architecture
Real-time inference pipeline

Audio Processing:

Log-mel spectrogram extraction
Voice Activity Detection (VAD)
Streaming audio processing

Deployment:

Docker containerization
Production-ready deployment
Scalable architecture

Output Formats:

SRT (SubRip) subtitle format
XML structured output
JSON for API integration

Key Features #

Real-Time Detection:

Streaming audio processing
Low-latency event detection
Continuous monitoring capability

Multi-Class Classification:

Nine distinct acoustic event classes
Emergency-specific sound detection
High accuracy classification

Audio Analysis:

Log-mel feature extraction
Voice Activity Detection
Background noise handling

Caption Generation:

Automated event timestamping
Multiple output format support (SRT/XML/JSON)
Structured event metadata

Production Deployment:

Dockerized for easy deployment
Scalable processing pipeline
Monitoring and logging

Use Cases #

Emergency telephony systems
Automated call analysis
Safety monitoring applications
Event logging and documentation