class: center, middle, light # Automatización del hogar con Python, Raspberry Pi y Reconocimiento de voz ### Mauricio Collazos .footnote[] --- class: center [Repositorio en github](https://github.com/contraslash/automatizacion-hogar-raspberry-python) ![https://contraslash.github.io/automatizacion-hogar-raspberry-python/](img/qr.png) --- # Disclaimer - No es un taller de IOT --- class: center # Modelo de cajas de un asistente conversacional ![Conversational assistant](img/conversational_assistant.png) --- # Reconocimiento automático de la voz (ASR) - 1952 Audrey por IBM (Reconocimiento de dígitos) - 1960 Shebox por IBM (Dígitos y operadores) - 1972 Dynamic Time Warping (Sakoe, Chiba y Vintsyuk) - 1980 Hidden Markov Models - 1990 Gaussian Mixtures Models - 2000 Robust Speech Recognition - 2012 Context Dependent Deep Neuran Netwowks Hidden Markov Models --- # Por qué es dificil? - Variabilidad - Continuidad - Ruido - Características del locutor - Cantidad de palabras --- class: center # Representación del audio ![Sampling](http://glasnost.itcarlow.ie/~powerk/audio/images/sampling1.gif) Sampling --- class: center # Extracción de características ![Spectrogram](img/freq-vs-spectre.png) Frecuencia vs Espectro --- class: center # Extracción de características ![Feature extraction](img/fe.png) Partición y extracción de características --- # Tipos de características - Mel Frequency Cepstral Coefficients (MFCC) - Perceptual Linear Prediction (PLP) - Linear frequency cepstral coefficients LFCC - Wavelet-packet features (WPF) - Subband-based cepstral parameters (SBC) - Mixed wavelet packet advanced combinational encoder (MWP-ACE) --- class: center # Unidades básicas del lenguaje ![International Phonetic Alphabet Consonants](img/ipa.png) Alfabeto fonético internacional Consonantes --- class: center # Unidades básicas del lenguaje ![International Phonetic Alphabet Vowels](img/ipa_vow.png) Alfabeto fonético internacional Vocales --- class: center # Modelado acústico ![Hidden Markov Models](img/hmm.png) Modelos Ocultos de Markov --- class: center # Modelado de lenguaje ![ngrams](img/ngrams.png) N-Gramas --- # [Speech Recognition](https://pypi.org/project/SpeechRecognition/) Wrapper: - [CMU Sphinx](http://cmusphinx.sourceforge.net/wiki/) (works offline) - [Google Cloud Speech API](https://cloud.google.com/speech/) - [Wit.ai](https://wit.ai/) - [Microsoft Bing Voice Recognition](https://www.microsoft.com/cognitive-services/en-us/speech-api) - [Houndify API](https://houndify.com/) - [IBM Speech to Text](http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html) - [Snowboy Hotword Detection](https://snowboy.kitt.ai/) (works offline) Interacción con el micrófono requiere [PyAudio](https://pypi.org/project/PyAudio/) --- # Google Speech ```python import speech_recognition as sr # Obtener el audio del micrófono r = sr.Recognizer() with sr.Microphone() as source: print("Diga algo") audio = r.listen(source) # Llamar al reconocedor de Google GOOGLE_CLOUD_CREDENTIALS = r"""JSON_CON_CREDENCIALES""" try: print(r.recognize_google_cloud( audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS ) ) except sr.UnknownValueError: print("Google Cloud Speech no pudo reconocer su audio") ``` Instalar google-api-python-client y oauth2client ```python pip install google-api-python-client oauth2client ``` --- # Leyendo desde un audio ```python import speech_recognition as sr import os # obtain audio from the microphone AUDIO_FILE = "ruta_absoluta o relativa" r = sr.Recognizer() with sr.AudioFile(AUDIO_FILE) as source: audio = r.record(source) # read the entire audio file GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""CREDENTIALS""" try: print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)) except sr.UnknownValueError: print("Google Cloud Speech could not understand audio") except sr.RequestError as e: print("Could not request results from Google Cloud Speech service; {0}".format(e)) ``` --- # SnowBoy ```python import speech_recognition as sr # Obtener el audio del micrófono r = sr.Recognizer() with sr.Microphone() as source: print("Diga algo") audio = r.listen(source) # Llamar al reconocedor de Snowboy try: print( r.snowboy_wait_for_hot_word( "ruta_a_la_carpeta_de_snowboy", "archivo de activación", audio )) except sr.UnknownValueError: print("Sphinx could not understand audio") except sr.RequestError as e: print("Sphinx error; {0}".format(e)) ``` --- # CMU Sphinx ```python import speech_recognition as sr # Obtener el audio del micrófono r = sr.Recognizer() with sr.Microphone() as source: print("Diga algo") audio = r.listen(source) # Llamar al reconocedor de sphinx try: print( r.recognize_sphinx( audio, ( "ruta_al_modelo_acustico", "ruta_al_modelo_de_lenguaje", "ruta_al_diccionario", ) ) ) except sr.UnknownValueError: print("Sphinx no pudo reconocer su audio") ``` --- # Reconocimiento con otros backends [Wit.ai](https://wit.ai/) ```python WIT_AI_KEY = "llave_de_32_caracteres_de_wit_ai" r.recognize_wit(audio, key=WIT_AI_KEY) ``` [Microsoft Bing Voice Recognition](https://www.microsoft.com/cognitive-services/en-us/speech-api>) ```python BING_KEY = "llave_de_32_caracteres_de_bing" r.recognize_bing(audio, key=BING_KEY) ``` [IBM Speech to Text](http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html) ```python IBM_USERNAME = "Nombre de usuario de IBM para Speech XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" IBM_PASSWORD = "Contraseña de usuario de IBM" r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD) ``` --- class: center # Conexiones entre el Relay y el Raspberry ![Schematics](https://i1.wp.com/electronicshobbyists.com/wp-content/uploads/2018/02/relay-AC.png) [Controlling AC Devices with Raspberry Pi | Raspberry Pi Relay Control](https://electronicshobbyists.com/controlling-ac-devices-with-raspberry-pi-raspberry-pi-relay-control/) --- # Encender el bombillo ```python import RPi.GPIO as GPIO def configure(pin): # Define mode to Broadcom SOC Channel GPIO.setmode(GPIO.BCM) # Define the pin as output GPIO.setup(pin, GPIO.OUT) def change_state(pin, state): # Remember that the pin must be set as output and state must be 0 or 1 GPIO.output(pin, state) def cleanup(): GPIO.cleanup() ``` [GPIO official documentation](https://www.raspberrypi.org/documentation/usage/gpio/) --- class: center # Sonoff ![Sonoff](https://cdn3.efectoled.com/46645/interruptor-wifi-simple-remoto-sonoff-rf.jpg) --- # Configuración DIY - [FTDI 232](https://www.amazon.com/HiLetgo-FT232RL-Converter-Adapter-Breakout/dp/B00IJXZQ7C/) - [Sonoff Basic](https://www.itead.cc/smart-home/sonoff-wifi-wireless-switch.html) - 4x1 Female Header - Soldador eléctrico - Jumpers