Projects

Climate Informatics
Goal:Forge collaborations between machine learning and climate science, in order to accelerate progress in answering pressing questions in climate science.Collaborations between machine learning and climate science, in order to accelerate progress in answering pressing questions in climate scienceThe threat of climate change is one of the greatest challenges currently facing society. Given the profound impact machine learning has made on the natural sciences to which it has been applied,
Project Twiki link:https://power.ldeo.columbia.edu/twiki/bin/view/Main/ClimateInformaticsLOQUI
Goal:Machines that speak with us (Spoken Dialogue Systems) rely disproportionately on accurate transcription of the speech signal into readable text. When the system has low confidence in the automatic speech recognition (ASR) of a caller's utterance, a typical dialogue strategy requires the system to repeat its best guess and ask for confirmation. This leads to unnatural interactions and dissatisfied callers. Our novel methodology, wizard ablation, collects simulated human-system dialogues that vary in controlled ways in order to investigate problem-solving strategies people would use if a person's abilities and options were restricted to be more like a machine's. Our testbed application, the CheckItOut dialog system, is modeled on a corpus of telephone transactions between patrons and librarians that we collected at New York City's Andrew Heiskell Braille & Talking Book Library. (Loqui, a Latin phrase meaning "I speak"; because the "I" in the case of an ablated wizard is neither the wizard nor the system, we like the alliterative allusion to Loki (lo-kee), the Norse god of mischief.)For Spoken Dialogue Systems (SDS), investigate human strategies for handling system errorsAn Advanced Learning Paradigm: Learning Using Hidden Information
Goal:Develop algorithms in the SVM family that allow extra information to be used effectively during training, with the understanding that this extra information will not be available during actual operationLearning extra information like structural homologies between proteins in a system designed to predict structure from amino acid sequencesAn 'Early Warning' Device to Allow Epilepsy Patients to Live a More Normal Life
Goal:To develop a wearable 'early warning' device for epilepsy patients using advanced machine learning technologyEarly Warning DeviceThe goal of the proposed research is to develop a wearable "early warning" device
attached to an implantable microelectrode array that will give otherwise untreatable
epilepsy patients enough time to take a medicine or prepare for the seizure (e.g. get out of
the pool, pull the car over to side of the road or get off a ladder or stairs). The device
would use detector software based on advanced machine learning technology to detect an
impending seizure. The learning system would be trained with data from the implantedOnline High Frequency Oscillation Detection
Goal:To develop a combination of hardware and software to automatically detect High Frequency Oscillations (HFOs) in real-time and in a clinical settingOnline High Frequency Oscillation DetectionHigh frequency oscillations (HFOs), or brief bursts in the high gamma band (80-500 Hz), have been studied as potential biomarkers of epileptic activity. Since the early 1990's, it has been recognized that increased high gamma power is present within the epileptogenic region at seizure onset in adults (Allen, Fish et al. 1992; Alarcon, Binnie et al. 1995) and children (Fisher, Webber et al. 1992; Traub, Whittington et al. 2001). Interictal fast ripples (Figure 1) have been detected almost exclusively in epileptogenic regions (Staba, Wilson et al. 2002; Jacobs, Levan et al.
Estimation of Mean Time Between Failures (MTBF) of Electrical Feeders and Related Components
The project aims to estimate the time between (and to) failures of primary distribution feeders and their components (such as sections and joints).In the New York City Power Grid, electricity is transmitted via primary distribution feeders between the high voltage transmission system and the household-voltage secondary system. These feeders are susceptible to different kinds of failures such as emergency isolation caused by automatic substation relays (Open Autos), failing on test, maintainence crew noticing problems and scheduled work on different sections of the feeder.
EEGMine: A Distributed Framework for Learning on iEEG Data
Goal:This project aims to develop a distributed framework for Data Management and Machine Learning on iEEG data obtained from Epilepsy patientsDistributed Data Mining (DDM) on iEEG dataProject Twiki link:https://power.ldeo.columbia.edu/twiki/bin/view/SeizurePrediction/EEGMineDocsCADIM: Columbia Arabic Dialect Modeling
Arabic Dialect Modeling for Speech and Natural Language Processing
Con Edison Projects - Secondary Events
An actionable real-world machine learning project which uses NLP, data cleaning and management and involves specific domain knowledge.
Our goal is to rank the electricity service structures (manholes and service boxes) in Manhattan, Brooklyn, Bronx, and Queens according to their vulnerability to serious manhole events such as manhole fires, explosions and smoking manholes.


