Professional Summary
Accomplished Data Engineer and AI/ML Researcher with demonstrated expertise in architecting large-scale data infrastructure and applying artificial intelligence to solve complex information retrieval and analysis problems.
Specialized in designing scalable ETL pipelines, database modernization, and AI/ML applications in Natural Language Processing, with proven success migrating 1.5 million+ historical metadata records to modern cloud databases and improving overall data discoverability.
Education
Master of Science in Electrical and Computer Engineering
Tennessee Tech University
Cookeville, Tennessee
Professional Experience
Data Engineer
Vanderbilt University Television News Archive
Nashville, Tennessee
- Architected and deployed mission-critical ETL pipelines supporting a large-scale television news archive, enabling researchers to analyze broadcast media content spanning multiple decades.
- Led database modernization initiative migrating 1.5 million+ historical metadata records from legacy systems to AWS Aurora MySQL with zero data loss and significant query performance improvements.
- Developed AI-enhanced metadata curation system using Databricks and NLP that reduced manual quality assurance time while improving data discoverability
- Engineered scalable data warehousing solutions on AWS utilizing S3, Glue, Lambda, and Aurora to support large video archive storage and metadata processing
Graduate Research Assistant & Adjunct Lecturer
Tennessee Tech University
Cookeville, Tennessee
- Conducted research on IoT and wireless systems addressing critical infrastructure applications
- Designed and implemented experimental IoT sensor networks for real-time environmental monitoring
- Taught Industrial Electronics course as Adjunct Lecturer, developing hands-on curriculum
- Led laboratory sessions for undergraduate engineering courses
Selected Publications & Talks
Metadata Matters: Modernizing the Vanderbilt Television News Archive Database
Coalition for Networked Information (CNI) Project Briefing Series, Winter 2026
Democratizing Access to Library Data Assets: AI-Enhanced Curation Model using Databricks
Southeast Data Librarian Symposium 2025
The Lakehouse for Research: Why Databricks? Augmenting R/Python for Petabyte-Scale Analytics and Reproducibility
Intro to Databricks & Social Media Research Roundtable by McGee Applied Research Center for Narrative Studies, 2025
Sample calculation of Link Power Budget and Effective SNR in NB-IoT
2018 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), Palladam, India, 2018, pp. 30-32
Current Research & Manuscripts
Data Descriptor: A Standardized Longitudinal Corpus of U.S. Broadcast News Transcripts (1968–Present) with PBCore Metadata and AI-Enhanced Validation for Scholarly Use
This descriptor details a curated 54-year corpus of American broadcast news transcripts derived from VTNA content using scalable cloud-native ASR pipelines (AWS Transcribe with serverless architectures and custom post-processing). Standardized with PBCore metadata, IPTC topics, linked data, AI-generated descriptions, and accuracy/versioning/SLO metrics (WER < 10% on validated samples), the corpus (tvn_transcripts delta table) is compliant with U.S. copyright (§107 fair use for scholarly transformation; §108(f)(3) exceptions) and accessible via institutional channels for non-commercial research. Usage notes include access protocols, noise caveats, and interoperability for AI-driven studies in media narratives, framing, and misinformation, advancing national priorities in digital preservation and computational humanities.
Secure Persistent Identification and Machine-Actionable Metadata: Modernizing the Vanderbilt Television News Archive for AI-Driven Historiography
This paper addresses the technical bottleneck of scaling legacy media archives for computational research. Using the Vanderbilt Television News Archive (VTNA) as a case study, we document the migration from a monolithic on-premise database to a cloud-native AWS Aurora architecture. We introduce a novel implementation of Nano IDs—decentralized, collision-resistant, and non-sequential identifiers—to replace legacy serial numbering, thereby eliminating enumeration vulnerabilities and ensuring citation persistence in distributed environments. The framework integrates the PBCore metadata standard with AI-generated transcripts (ASR) and IPTC Media Topics, transforming unstructured broadcast video into a machine-actionable dataset. By implementing a granular versioning schema for AI outputs, we provide the scholarly provenance required for reproducible AI/ML research. This hybrid engineering and librarianship model offers a scalable blueprint for modernizing global broadcast repositories into secure, citable, and computationally tractable research hubs.
Large-Scale PBCore Adaptation with AI/ML for Archival TV News: Infrastructure Modernization and Implications for Computational Research at Vanderbilt
This paper presents the large-scale adaptation of the PBCore metadata standard to the Vanderbilt Television News Archive (VTNA), a 58-year collection encompassing over 1.4 million news segments and commercial breaks (1968–present). The modernization effort integrates PBCore as the core metadata framework with AWS Aurora database migration, AI/ML-driven enhancements including automated speech recognition (ASR) transcripts, AI-generated titles and descriptions, and linked data structures to improve semantic interoperability and search capabilities. Databricks-enabled data lakes facilitate scalable processing, versioning, and service level objectives (SLOs) for secure, compliant computational access. Governance and stewardship protocols ensure adherence to U.S. copyright law, specifically fair use under 17 U.S.C. § 107 for transformative non-commercial scholarly research and § 108(f)(3) exceptions for audiovisual news programs. Empirical results demonstrate significant improvements in query latency (up to 70% reduction), text/data mining readiness, and researcher accessibility. The implications for computational research are substantial: the infrastructure enables advanced longitudinal studies of U.S. broadcast media narratives, framing analysis, public discourse evolution, and misinformation detection—contributing to national priorities in digital cultural heritage preservation, open science, AI ethics, and media literacy.
AI-Enhanced Data Curation Model for Licensed or Restricted Datasets: Governance, Stewardship, and Automation in Research Collections
Building on foundational work presented at SEDLS 2025 ("Democratizing Access to Library Data Assets: AI enhanced curation model using Databricks"), this paper proposes a novel AI-enhanced curation model specifically designed for licensed or restricted datasets and collections. The model leverages Databricks-enabled data lakes for scalable processing, integration of automated transcription outputs (e.g., ASR-derived content), AI-generated titles and descriptions, and automated workflows for versioning, service level objectives (SLOs), and access controls. Governance and stewardship protocols address ethical handling, compliance verification, and responsible use under applicable legal frameworks (including U.S. copyright fair use under 17 U.S.C. § 107 for transformative non-commercial scholarly research and relevant exceptions for archival/educational purposes). Validation through applied case studies demonstrates improved query efficiency, text/data mining readiness, and support for academic and computational applications in areas such as narrative analysis, pattern detection, and knowledge discovery—contributing to broader priorities in open science, responsible AI, and equitable access to restricted data resources.
Technical Skills
Programming
- Python (Advanced)
- SQL (PostgreSQL, MySQL, Aurora)
- R (Statistical Computing)
- Scala (Spark Applications)
Data Engineering
- Apache Spark & PySpark
- ETL Pipeline Development
- Data Warehousing
- Databricks Platform
AI/ML & Analytics
- Natural Language Processing
- Machine Learning (scikit-learn, TensorFlow)
- Sentiment Analysis
- Statistical Analysis
Cloud Platforms
- AWS (S3, Glue, Lambda, Aurora)
- Microsoft Azure
- Docker & Containers
- CI/CD Pipelines
Visualization
- Tableau
- Power BI
- Matplotlib
Data Librarianship
- Metadata Standards & Schema Design (Dublin Core, MODS)
- Metadata Curation & Enrichment
- Digital Preservation Practices
- Cataloging, Controlled Vocabularies & Authority Files;IPTC and PBCore
Certifications
AWS Certified Solutions Architect - Associate
Amazon Web Services
AWS Certified Data Engineer - Associate
Amazon Web Services