Final Exercise: Building a Speech Recognizer and integrating it with the chatbot…!

Update: You can access all the code for the workshop at:

Final Exercise

Use the chatbot, template_matching and speech_processing notebooks to create a voice activated chatbot that answers yes/know questions.

Solution:

  • Use the Bot class and the yes_no_processor to get a ready made chatbot
  • Create a new speech_source for your Bot instance
  • Use the AudioManager from speech_processing to record audio
  • Extract MFCCs for the audio clips corresponding to yes and no
  • Use the Trellis idea from template_matching to recognize yes/no

 Answer

The imports given below are from the previous exercises. You can view all the posts on this topic here.

from collections import defaultdict

import importer
from chatbot import StatementProcessor, get_yes_no_processor, get_keyboard_source, Bot
from template_matching import Trellis
from speech_processing import AudioManager

from python_speech_features import mfcc
from python_speech_features.base import delta
import numpy as np
from collections import defaultdict
import pickle
import os
importing notebook from chatbot.ipynb
importing notebook from template_matching.ipynb
importing notebook from speech_processing.ipynb
if __name__ == "__main__":
    # Install python_speech_features that contains a routine to extract mfcc
    !pip install -U python_speech_features
class TemplateManager:

    @staticmethod
    def build_templates(words=["test", "hello", "welcome", "goodbye"],
                        no_templates=1,
                        output_file="templates.out"):

        audioManager = AudioManager()

        templates = defaultdict(list)

        for word in words:
            for ii in range(no_templates):
                ok = 'n'
                while (ok.lower()=='n'):
                    print("%d/%d Say %s" %(ii, no_templates, word))
                    samples = audioManager.record(2, filter_silence=False)
                    audioManager.play(samples)
                    features = feature_extractor(samples)
                    templates[word].append(features)
                    #ok = raw_input("OK?") # python2
                    ok = input("OK?") # python3
        pickle.dump(templates, open(output_file,"wb"))
    
    @staticmethod
    def get_templates(filename):
        if os.path.exists(filename):
            return pickle.load(open(output_file,"rb"))
        else:
            print("Template file not found.")
def feature_extractor(samples):
    samples = np.concatenate(samples)
    samples = samples/np.abs(samples).max()
    samples = samples - samples.mean()
    mfcc_features = mfcc(samples, samplerate=8000, winlen=0.032, winstep=0.016, numcep=13, appendEnergy=True, preemph=0)
    #features = np.vstack((mfcc_features, delta(mfcc_features, 1)))
    features = mfcc_features
    return features
def scoring_func(x, y):
    #print(x.shape, y.shape)
    #print(x, y)
    return np.abs(x - y).sum()


def get_speech_source(filename):
    # Load speech templates
    # Return a function that can detect speech
    audioManager = AudioManager()
    trellis = Trellis(match_weight=1.0, delete_weight=1.0, add_weight=1.0, scoring_func=scoring_func)
    
    templates_dict = TemplateManager.get_templates(filename)
    statement_processor = StatementProcessor()
    
    def speech_source():
        best_scoring_word = ""
        #inp = raw_input("Start recording?") # python2
        inp = input("Start recording?") # python3
        if len(inp)>0 and inp[0] == "/":
            return inp
        samples = audioManager.record(2, wait_for_kb=False)
        features = feature_extractor(samples)
        
        min_score = 1e9
        min_word = ""
        for word, word_templates in templates_dict.items():
            avg_score = 0.0
            for word_template in word_templates:
                score, bp = trellis.match(word_template, features)
                avg_score += score
            avg_score = avg_score / float(len(word_templates))
            #print(word, avg_score)
            if avg_score < min_score:
                min_score = avg_score
                min_word = word
        print("YOU>> ", min_word)
        return min_word
        # Record some audio
        # Match the audio with every template using Trellis
        # Return the best scoring result 
        return best_scoring_word
    return speech_source
words = ["yes", "no"]
no_templates = 1
output_file = "templates.out"
TemplateManager.build_templates(words, no_templates, output_file)
0/1 Say yes
Press Enter to start recording...
* recording
* done recording
OK?
0/1 Say no
Press Enter to start recording...
* recording
* done recording
OK?
chatbot = Bot(statement_processor=StatementProcessor(statement_logic=get_yes_no_processor()),
             input_source=get_speech_source(output_file))
chatbot.start_bot()
Start recording?
* recording
* done recording
before 125
after 125
YOU>>  yes
[ 0 ] Poincare >>  Is it raining?
[ 0 ] Poincare >>  Give the right answer.
Start recording?
Posted in Programming, SpeechActivatedChatBotWorkshop | Tagged , , , | Leave a comment

Speech processing with Python: Basics…

Update: You can access all the code for the workshop at:

Introduction to Speech Processing

What is a signal?

What can we use speech signals for?

import pyaudio
import threading

class AudioManager:
    def __init__(self,chunk=128, fmt=pyaudio.paInt16, channels=1, rate=8000):
        self.chunk = chunk
        self.fmt = fmt
        self.channels = channels
        self.rate = rate
        self.energy_th = 0

    def build_silence_model(self, duration=1, factor=1.5):
        print("Please stay quiet. Measuring ambient noise...")
        frames = self.record(duration, filter_silence=False, wait_for_kb=False)
        es = []
        for f in frames:
            energy = self.energy(f)
            es.append(energy)
        es = np.array(es)
        self.energy_th = es.mean() + factor*es.std()

    def energy(self, frame):
        return sum([abs(v) for v in frame])/len(frame)
    
    def record(self, duration=1, filter_silence=True, wait_for_kb=True):
        if wait_for_kb:
            #x = input("Press Enter to start recording...") # Python3
            x = raw_input("Press Enter to start recording...")
        p = pyaudio.PyAudio()
        stream = p.open(format=self.fmt,
                        channels=self.channels,
                        rate=self.rate,
                        input=True,
                        frames_per_buffer=self.chunk)
        print("* recording")
        frames = []
        starting_silence = True
        silence_frame_cnt = 0
        for i in range(0, int((self.rate / self.chunk) * duration)):
            data = stream.read(self.chunk)
            d = np.fromstring(data, dtype=np.int16)
            if filter_silence:
                energy = self.energy(d)

                if energy < self.energy_th:
                    if starting_silence:
                        continue
                    else:
                        silence_frame_cnt += 1
                        if silence_frame_cnt == int(self.rate/self.chunk):
                            break
                else:
                    starting_silence = False
                    silence_frame_cnt = 0
            frames.append(d)
        print("* done recording")
        
        stream.stop_stream()
        stream.close()
        p.terminate()
        
        if filter_silence:
            print("before", len(frames))
            term = len(frames)
            for ii in range(len(frames)-1, -1, -1):
                e = self.energy(frames[ii])
                if e < self.energy_th:
                    term = ii
                else:
                    break
            frames = frames[:term]
            print("after", len(frames))
        return frames

    def play(self, frames):
        
        p = pyaudio.PyAudio()  
        #open stream
        stream = p.open(format = self.fmt,
                        channels = self.channels,
                        rate = self.rate,
                        output = True)
        
        if type(frames) is list:
            frames = list(frames)
            b = np.zeros(frames[0].shape, dtype=np.int16)
            frames.insert(0, b)
            frames.append(b)
            frames = np.concatenate(frames)
        
        stream.write(frames.tostring())
        
        #stop stream  
        stream.stop_stream()  
        stream.close()  
        #close PyAudio  
        p.terminate()  

def plot_fft(y, fs):

    n = len(y) # length of the signal
    k = np.arange(n)
    T = 2*n/float(fs)
    frq = k/T # two sides frequency range
    frq = frq[range(int(n/2))] # one side frequency range

    Y = fftpack.dct(y)
    Y = Y[:int(n/2)]
    plt.plot(frq, abs(Y))
    plt.show()

 

from scipy import signal, fftpack
import matplotlib.pyplot as plt
import numpy as np
import math

if __name__=="__main__":
    # Concept of normalized time and frequency...
    fs = 8000
    Ts = 1/float(fs)
    
    t = np.arange(0, 1, Ts)
    y = np.sin(2*np.pi*200*t) + 0.25*np.sin(2*np.pi*500*t)#+ np.tan(t + 0.5) + 2*np.cos(t + 0.5)
    plt.plot(t, y)
    plt.show()
    plot_fft(y, fs)
if __name__=="__main__":
    audioManager = AudioManager()
    audioManager.build_silence_model(factor=1.1)
    samples = audioManager.record(1, filter_silence=False)
    audioManager.play(samples)
    plt.plot(np.concatenate(samples))
    plt.show()
Please stay quiet. Measuring ambient noise...
* recording
* done recording
Press Enter to start recording...
* recording
* done recording
 
 speech_processing_3
if __name__=="__main__":
    # Speech spectrogram
    f, t, Sxx = signal.spectrogram(np.concatenate(samples),
                                   fs=audioManager.rate,
                                   window=signal.gaussian(audioManager.chunk/2, audioManager.chunk/8),
                                   nperseg=audioManager.chunk/2)
    plt.pcolormesh(t, f, Sxx)
    plt.show()

    f, t, Sxx = signal.spectrogram(np.concatenate(samples),
                                   fs=audioManager.rate,
                                   window=signal.gaussian(audioManager.chunk*2, audioManager.chunk),
                                   nperseg=audioManager.chunk*2)
    plt.pcolormesh(t, f, Sxx)
    plt.show()
if __name__=="__main__":
    print(len(samples))
    print("Voiced Region (Vowel)")
    plt.plot(samples[45])
    plt.show()
    plot_fft(samples[45], fs=8000)
    print("Noise")
    plt.plot(samples[5])
    plt.show()
    plot_fft(samples[5], fs=8000)
62
Voiced Region (Vowel)
Comment -- bad example. will fix later :)
Noise
Posted in Programming, SpeechActivatedChatBotWorkshop | Tagged , , | Leave a comment

Exercise 4: Extending the bot to do QA

Update: You can access all the code for the workshop at:

Exercise 4

  • Re-formulate the chatbot to ask a list of Yes/No questions that is procured from a file.
  • State if the user’s answer is correct or wrong.
  • Make the source of input to the bot configurable. For now, it will come from the keyboard. Soon, we’ll use our voice.

 Answer

This code makes use of the “Context” class created Exercise 3. Click here to view Exercise 3

from collections import deque
from random import choice

class StatementProcessor:
    
    def __init__(self, N=10, statement_logic=lambda context, x: (True, ["OK"])):
            self.context = Context(N)
            self.statement_logic = statement_logic
    
    def process_statement(self, x):
        context = self.context
        if x[0] == "/":
            cont, response = self.process_command(x[1:])
        else:
            cont, response = self.statement_logic(context, x)
        return cont, response

    def process_command(self, x):
        context = self.context
        parts = x.split()
        cmd = parts[0]
        args = parts[1:]
        
        cont = False
        response = []
        
        if cmd == "quit":
            cont = False
            response = ["Goodbye!"]
        elif cmd == "clearcontext":
            context.clear()
            cont = True
            response = ["Cleared context."]
        elif cmd == "printcontext":
            response = context.get()
            response.append("Context length: %d" % (len(context)))
            cont = True
        elif cmd == "resizecontext":
            if len(args) == 1:
                try:
                    context.resize(int(args[0]))
                    cont = True
                    response = ["Resized context to %s" % (args[0])]
                except TypeError:
                    cont = True
                    response = ["Context size should be int"]
            else:
                cont = True
                response = ["resizecontext requires new size (int)"]
        else:
            cont = True
            response = ["Invalid Command"]
        context.add(x, response ,cont)
        return cont, response

    
def get_yes_no_processor(filename="binary_questions.txt"):
        qas = [v.split("#<>#") for v in filter(None, open(filename).read().split("\n"))]
        def statement_logic(context, x):
            prev_context = context.get()
            response = ["OK."]
            cont = True
            if len(prev_context)>0:
                prev_context = prev_context[-1]
            else:
                prev_context = {}
            #print(prev_context)
            if "tags" in prev_context and "question" in prev_context["tags"] and "qtype" in prev_context["tags"]["question"] and prev_context["tags"]["question"]["qtype"] == "binary":
                if prev_context["tags"]["question"]["expected_response"] == x:
                    response = ["Correct."]
                else:
                    response = ["Wrong.","Right answer is %s" %(prev_context["tags"]["question"]["expected_response"])]
                context.add(x, cont, response)
            else:
                qa_current = choice(qas)
                response = [qa_current[0]]
                response.append("Give the right answer.")
                context.add(x, cont, response, question={"expected_response": qa_current[1], "qtype": "binary"})
            return cont, response
        return statement_logic


def get_keyboard_source():
    def read_keyboard():
        #return raw_input("You>> ") # Python2
        return input("You>> ")
    return read_keyboard
    
class Bot:
    def __init__(self, name='Poincare', statement_processor=StatementProcessor(), input_source=get_keyboard_source()):
        self.name = name
        self.statement_processor = statement_processor
        self.input_source = input_source

    def start_bot(self):
        cont = True
        chat_cnt = 0

        while cont:
            x = self.input_source()
            cont, response = self.statement_processor.process_statement(x)
            for r in response:
                print("[", chat_cnt, "]", self.name, ">> ", r)
            chat_cnt += 1
 Running the chatbot
if __name__=="__main__":
    bot = Bot(statement_processor=StatementProcessor(statement_logic=get_yes_no_processor()))
    bot.start_bot()
 Output
You>> Hi
[ 0 ] Poincare >>  Is it cold outside?
[ 0 ] Poincare >>  Give the right answer.
You>> no
[ 1 ] Poincare >>  Correct.
You>> /quit
[ 2 ] Poincare >>  Goodbye!
Posted in Programming, SpeechActivatedChatBotWorkshop | Tagged , | Leave a comment

Exercise 3: Basic chatbot…

Update: You can access all the code for the workshop at:

Exercise 3

Chatbot dialog management

Your job is to create a chatbot… As a first step, you will have to define the chatbot’s framework.

In this exercise, you will have to write the necessary code to:

  1. Read input from a user, as a chatbot would and display a simple response for an input. This should look like a conversation happening on a messenger.
  2. Maintain dialogue state: which is the past “N” inputs from the user to the bot and the bot’s responses
  3. A few housekeeping commands for the bot:
    • Clear the context (/clearcontext)
    • Print out the context (/printcontext)
    • Configure the size of the context (/resizecontext N)
    • Quit the conversation (/quit)

 Answer

from collections import deque

def process_statement_basic(x):
    if x[0] == "/":
        cont, response = process_command(x[1:])
    else:
        response = ["OK."]
        cont = True
    return cont, response


def process_command(x):
    parts = x.split()
    cmd = parts[0]
    args = parts[1:]
    
    
    global context
    if cmd == "quit":
        return False, ["Goodbye!"]
    elif cmd == "clearcontext":
        context.clear()
        return True, ["Cleared context."]
    elif cmd == "printcontext":
        response = context.get()
        response.append("Context length: %d" % (len(context)))
        return True, response
    elif cmd == "resizecontext":
        if len(args) == 1:
            
            try:
                context.resize(int(args[0]))
                return True, ["Resized context to %s" % (args[0])]
            except TypeError:
                return True, ["Context size should be int"]
        else:
            return True, ["resizecontext requires new size (int)"]
    else:
        return True, ["Invalid Command"]

class Context:
    
    def __init__(self, N):
        self.N = N
        self.context = None
        self.init()
    
    def __len__(self):
        return self.N
    
    def init(self):
        if self.context is None:
            self.context = deque(list(), self.N)
        else:
            self.context = deque(list(self.context), self.N)
        
    def add(self, x, response, cont, **kwargs):
        self.context.append({"x": x, "response": response, "cont": cont, 'tags': kwargs})
        
    def clear(self):
        self.context = None
        self.init()
    
    def resize(self, N):
        self.N = N
        self.init()

    def get(self):
        return list(self.context)
        
context = None

def start_chatbot(N = 10, name = 'Poincare'):
    global context
    context = Context(N)
    
    cont = True
    chat_cnt = 0
    
    while cont:
        x = input("You>> ")
        cont, response = process_statement_basic(x)
        for r in response:
            print("[", chat_cnt, "]", name, ">> ", r)
        context.add(x, response ,cont)
        chat_cnt += 1
Running the chatbot…
if __name__=="__main__":
    # Start the bot...  
    start_chatbot()
 Output:
You>> hi
[ 0 ] Poincare >>  OK.
You>> hello
[ 1 ] Poincare >>  OK.
You>> /resizecontext 2
[ 2 ] Poincare >>  Resized context to 2
You>> /printcontext
[ 3 ] Poincare >>  {'x': 'hello', 'tags': {}, 'cont': True, 'response': ['OK.']}
[ 3 ] Poincare >>  {'x': '/resizecontext 2', 'tags': {}, 'cont': True, 'response': ['Resized context to 2']}
[ 3 ] Poincare >>  Context length: 2
You>> /quit
[ 4 ] Poincare >>  Goodbye!
Posted in Programming, SpeechActivatedChatBotWorkshop | Tagged , | 1 Comment

Exercise 2: Spelling correction with Minimum-Edit Distance

Update: You can access all the code for the workshop at:

We’ll be using the “Trellis” class created in Exercise one to implement a simple program to correct spellings. Check here for the “Trellis” class.

Exercise-2

Use the matching algorithm written above to correct spellings of words intput thru the keyboard. I.e. create your own spell checker! (albiet it being quite inefficient…)

A list of english words has been given to you in words2.txt

if __name__=="__main__":
    dictionary = list(filter(None, open("words2.txt","r").read().split("\n")))

    trellis = Trellis(lambda x, y: 0.0 if x == y else 1.0, delete_weight=4.0)
    print("Enter /quit to quit")
    while True:
        x = input("word>> ")
        x = x.lower()
        if x == "/quit":
            print("Goodbye!")
            break
        if x in dictionary:
            print("word found")
            continue
        min_sc = 1e9
        match = x
        for el in dictionary:
            sc = trellis.match(el, x, normalize_score=False)[0]
            if sc < min_sc:
                min_sc = sc
                match = el
        print("closest match: ", match)
Enter /quit to quit
word>> hsllo
closest match:  hello
word>> vase
closest match:  case
word>> /quit
Goodbye!
Posted in Programming, SpeechActivatedChatBotWorkshop | Tagged , | Leave a comment

Exercise 1: Minimum-Edit-Distance (using Dynamic Programming)

Update: You can access all the code for the workshop at:

Exercise-1

Create a class Trellis that

  • takes in four arguments: match_weight, delete_weight, add_weight, and scoring_func.
    • scoring_func is a function that computes the distance or score between two values.
    • match_weight, delete_weight, add_weight are floats that weigh a diagonal, horizontal, and vertical transitions, respectively.
  • contains a method match(X, Y) where X and Y are arrays of values; the values can be characters, scalars, or even vectors that returns the minimum-edit-distance/matching-score between X and Y and the shortest path (as an array of 2-tuples).

Answer

The “Trellis” Class:

import numpy as np
from copy import deepcopy

class Trellis:
    
    def __init__(self, scoring_func, match_weight=1.0, delete_weight=1.0, add_weight=1.0):
        self.scoring_func = scoring_func
        self.match_weight = match_weight
        self.delete_weight = delete_weight
        self.add_weight = add_weight
    
    def match(self, X, Y, normalize_score=True):
        scoring_func = self.scoring_func
        match_weight, delete_weight, add_weight = self.match_weight, self.delete_weight, self.add_weight
    
        score_rows = np.zeros((2, len(X)+1))
        path_counts = np.zeros((2, len(X)+1))
        back_pointers = []
        for ii in range(len(X)):
            back_pointers.append([])
        score_rows[0, 1:] = 1e9
        score_rows[1:, 0] = 1e9
        
        
        jj = 1
        while jj < len(Y) + 1:
            back_pointer_before_iteration = deepcopy(back_pointers)
            for ii in range(1, len(X)+1):
                diag_score = score_rows[0, ii-1] + match_weight
                vert_score = score_rows[0, ii] + add_weight
                horiz_score = score_rows[1, ii-1] + add_weight
                min_score = min(diag_score, vert_score, horiz_score)
                if min_score == diag_score:
                    back_pointers[ii-1] = list(back_pointer_before_iteration[ii-2])
                    back_pointers[ii-1].append( (jj-2, ii -2) )
                    #print("DIAG", ii-1, back_pointers)
                elif min_score == vert_score:
                    back_pointers[ii-1].append( (jj-2, ii-1) )
                    #print("VERT", ii-1, back_pointers)
                else:
                    back_pointers[ii-1] = list(back_pointers[ii-2])
                    back_pointers[ii-1].append( (jj-1, ii-2) )
                    #print("HORIZ", ii-1, jj-1, back_pointers)
                node_total = min_score + scoring_func(X[ii-1], Y[jj-1])
                score_rows[1, ii] = node_total
            score_rows[0, :] = score_rows[1, :]
            score_rows[1, 1:] = 0
            jj += 1
            #print(back_pointers)
        return score_rows[0, -1], back_pointers[-1]
Testing the “Trellis” class
if __name__=="__main__":
    trellis = Trellis(match_weight=0.0, scoring_func=lambda x, y: 0.0 if x == y else 1.0)

    test_cases = [
        ['TEST', 'TES'],
        ['geek', 'gesek'],
        ['ISLANDER', 'SLANDER'],
        ['MART', 'KARMA'],
        ['TEST', "TEST"]
    ]

    for case in test_cases:
        print(case, trellis.match(case[0], case[1], normalize_score=True)[1])
 Output:
['TEST', 'TES'] [(-1, -1), (0, 0), (1, 1), (2, 2)]
['geek', 'gesek'] [(-1, -1), (0, 0), (1, 1), (2, 1), (3, 2)]
['ISLANDER', 'SLANDER'] [(-1, -1), (0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
['MART', 'KARMA'] [(-1, -1), (0, 0), (1, 1), (2, 2), (3, 2)]
['TEST', 'TEST'] [(-1, -1), (0, 0), (1, 1), (2, 2)]
Posted in Programming, SpeechActivatedChatBotWorkshop | Tagged , | 1 Comment

Python tutorial

Update: You can access all the code for the workshop at:

Introduction to Python


Difficulty level: undergraduate

Objectives:

  1. Learn about Python
  2. Get to speed with writing programs in Python

Introducing Python

Python is:

  • A high-level programming langauge
  • Dynamically, Strongly typed
  • For general-purpose computing
  • Interpreted
  • Automatic/Internal Memory Management
  • Object-oriented, Structural, a “little” Functional and Aspect-Oriented

CPython is an open-source implementation of Python

Was first written in late 1980s by Guido van Rossum.


Advantages of learning Python (over other high-level languages)

  • Generally uses fewer lines of code to express concepts
  • Has libraries for literally anything you can think of (Web, Mobile, Scientific, Data Storage, etc.)
  • Dynamically and strongly typed? Has support for type-hinting
  • More productive; can be used for small to large products; prototyping to production-grade systems
  • Has an open-source implementation

Most linux distributions ship with Python 2.7.

In this course, we’re going to use Python 3.4+. The Python 2 series are scheduled to be deprecated by 2020.


Contents

Basics

  • The “hello world”
  • Basics
    • Variable Assignments
    • Function calls
    • Variable scope
    • Input and Output
    • Exception handling
    • Loops and conditional statements
  • Data Types
    • Mutable and non-mutable types
    • Iterators and Generators
    • Closures
    • Decorators
  • Packaged utilities
    • Collections
    • Itertools
  • A bit of OOP
  • Using external libraries
  • Basic packaging

Application Development Examples

  • A web server
  • Data crunching with numpy and pandas

The “Hello World”


In [2]:
if __name__ == "__main__":
    
    print("Hello World!")
Hello World!

Basics


Variable Assignments

Python uses duck-typing…
It is dynamically, but strongly typed.

x = 2 # int
y = ‘5’ # or y = “5”; String
z = 2.0 # float
t = [1, 2] # This is a list – which is similar to your linked list (more on this later)

print(x + z) # Involves automatic type conversion to float, since z is a float

del x # Deleting a variable

try:
print(y + z) # Throws an error as type casting is not done.
except TypeError:
print(“TypeError occurred”)

In [3]:
x = 2 # int
y = '5' # or y = "5"; String
z = 2.0 # float
t = [1, 2] # This is a list - which is similar to your linked list (more on this later)

print(x + z) # Involves automatic type conversion to float, since z is a float

del x # Deleting a variable

try:
    print(y + z) # Throws an error as type casting is not done.
except TypeError:
    print("TypeError occurred")
4.0
TypeError occurred

Function Calls

Built-in functions listed at: https://docs.python.org/3/library/functions.html

In [82]:
# defining a fucntion

print ("defining a function")
def foo(x, y, a=5): # x and y are positional parameters, a has a default value of 5
    print(x, y, a)

foo(1, 2, 2)
foo(1, 2)
print("--")

# defining a function with variable number of arguments
print ("defining a function with arbitrary number of arguments")
def foo(*args):
    for a in args:
        print(a)

foo(1)
print("--")
foo(1,2,3)

# defining a function with variable number of arguments after compulsary arguments
print ("defining a function with arbitrary number of arguments after compulsory arguments")
def foo(x, y, *args):
    print(x, y, args)
    
foo(1, 2, 3, 4, 5)

# calling a function using a dictionary of values
print ("calling a function using a dictionary of values that pose as arguments")
foo(**{"x": 1, "y": 2})

# everything in one function
print("a function with arguments of all kinds")
def foo(x, *args, y=1, **kwargs):
    print(x, y, args, kwargs)

foo(1, 2, 3, 4, y=2, z=5)

# anonymous functions
print("anonymous function")
sqr = lambda x: x*x
print(sqr(2))

print("examples of builtin functions")
# some useful built-in functions
print("sorted")
print(sorted([10, 2, 22, 1, 33, 44, 11, 23])) # sorted
print("filter")
print(list(filter(lambda x: x > 10, [10, 2, 22, 1, 33, 44, 11, 23]))) # filter
print("length")
print(len([1,2,3])) # len
print("max")
print(max([1,2,3])) # max

# checkout the documentation for other such built-ins
defining a function
1 2 2
1 2 5
--
defining a function with arbitrary number of arguments
1
--
1
2
3
defining a function with arbitrary number of arguments after compulsory arguments
1 2 (3, 4, 5)
calling a function using a dictionary of values that pose as arguments
1 2 ()
a function with arguments of all kinds
1 2 (2, 3, 4) {'z': 5}
anonymous function
4
examples of builtin functions
sorted
[1, 2, 10, 11, 22, 23, 33, 44]
filter
[22, 33, 44, 11, 23]
length
3
max
3

Variable Scope

In [5]:
x = 0

def foo():
    x = 1
    print(x)

def foo2():
    global x
    x = 1
    print(x)
    
print(x)
foo()
print(x)
foo2()
print(x)
0
1
0
1
1
In [70]:
# Output
x = 2
print(x) # Output to screen

# Formatting output
print("This is a %s; number %d; float %f" % ("test", 1, 2.0))
print("This is a {}; number {}; float {}".format("test", 1, 2.0))

open('test.txt', 'w').write(str(x)) # Output to file

# Input
x = input('Enter a number: ') # Read from screen
print(x, type(x))

x = open('test.txt', 'r').read() # Read from file
print(x)

# Type casting
x = '2'
print(int(x) + 1)
try:
    print(x + 1) # Will throw an error
except TypeError:
    print("TypeError occurred")
2
This is a test; number 1; float 2.000000
This is a test; number 1; float 2.0
Enter a number: 10
10 <class 'str'>
2
3
TypeError occurred

Exception Handling

In [67]:
import traceback # this provides functions to get the error trace

# Example of exception handling...

try: # try statement
    x = int('a') # throws an exception as the string a cannot be cast to int
except ValueError as err: # catches an exception
    print("ValueError occurred")
    traceback.print_exc() # prints details
    print("Error returned...", err)
finally:
    print("This gets executed irrespective of whether an exception occurred")
ValueError occurred
Error returned... invalid literal for int() with base 10: 'a'
This gets executed irrespective of whether an exception occurred
Traceback (most recent call last):
  File "", line 6, in 
    x = int('a') # throws an exception as the string a cannot be cast to int
ValueError: invalid literal for int() with base 10: 'a'

Loops and Conditional Statements

In [80]:
# if statement
print("if statement")
x = 5
if x == 5:
    print("x is 5")
elif x > 5: # else if
    print("x > 5")
else:
    print("x < 5")

# for loop
print("for loop")
for ii in range(10): # range(10) --> iterator giving 0 ... 9
    print(ii)

print("for loop: multiple vars")
for a, b in zip([1, 2, 3], [4, 5, 6]): # having multiple vars to iterate thru'
    print(a, b)

# while loop
print("while loop")
c = 0
while c < 3:
    print(c)
    c += 1
    
print("for loop with pass statement")
for ii in range(10):
    pass # similar to nop in assembly code

print("for loop with break statement")
for ii in range(10):
    print(ii)
    break # break from loop
    
print("for loop with continue statement")
for ii in range(10):
    if ii < 5:
        continue # skip loop to next iteration
    print(ii)

print("in-line for loop")
print([v * v for v in [1, 2, 3] if v >= 2]) # in-line for loop
if statement
x is 5
for loop
0
1
2
3
4
5
6
7
8
9
for loop: multiple vars
1 4
2 5
3 6
while loop
0
1
2
for loop with pass statement
for loop with break statement
0
for loop with continue statement
5
6
7
8
9
in-line for loop
[4, 9]

Data Types


Mutable and non-mutable types

Mutable Objects: Can be modified after instantiation
non-Mutable: Can’t!

Python in-built data types,

  • str, int, float, complex, frozenset, tuple, bytes, complex, and bool are immutable
  • bytearray, list, set, and dict are mutable

Some examples of data structures are given below.

Documentation: https://docs.python.org/3/library/datatypes.html

In [7]:
# List (A heterogeneous collection of elements)
x = [1, int(2), 3, 'a', "bcd", 2.0, complex(2, 3), 5 + 6j, [5, 6]]
print(x)

# Adding an element
x.append(3)
print(x)

# Adding many elements
x.extend([1, 2, 3])
print(x)

# Removing (the first matching) element
x.remove(1)
print(x)

# Reversing a list
x.reverse()
print(x)

# Remove all elements
x.clear()
print(x)

# Inline list comprehension
x = [1, 2, 3, 4, 5, 6]
print("Even numbers in x", [v for v in x if v % 2 == 0])
[1, 2, 3, 'a', 'bcd', 2.0, (2+3j), (5+6j), [5, 6]]
[1, 2, 3, 'a', 'bcd', 2.0, (2+3j), (5+6j), [5, 6], 3]
[1, 2, 3, 'a', 'bcd', 2.0, (2+3j), (5+6j), [5, 6], 3, 1, 2, 3]
[2, 3, 'a', 'bcd', 2.0, (2+3j), (5+6j), [5, 6], 3, 1, 2, 3]
[3, 2, 1, 3, [5, 6], (5+6j), (2+3j), 2.0, 'bcd', 'a', 3, 2]
[]
Even numbers in x [2, 4, 6]
In [8]:
# Dict (hashmap)

x = {} # or x = dict()

# Adding elements
x['test'] = 1
x[1] = 100
x[2.0] = 2.35
print(x)

# Alternative initialization
x = {'test': 1, 1: 100, 2.0: 2.35}
print(x)

# Removing an element
del x['test']

# To find out the methods available:
print(dir(x))

# Getting help
help(x)
{'test': 1, 1: 100, 2.0: 2.35}
{'test': 1, 1: 100, 2.0: 2.35}
['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
Help on dict object:

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      True if D has a key k, else False.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self size of D in memory, in bytes
 |  
 |  clear(...)
 |      D.clear() -> None.  Remove all items from D.
 |  
 |  copy(...)
 |      D.copy() -> a shallow copy of D
 |  
 |  fromkeys(iterable, value=None, /) from builtins.type
 |      Returns a new dict with keys from iterable and values equal to value.
 |  
 |  get(...)
 |      D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.
 |  
 |  items(...)
 |      D.items() -> a set-like object providing a view on D's items
 |  
 |  keys(...)
 |      D.keys() -> a set-like object providing a view on D's keys
 |  
 |  pop(...)
 |      D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
 |      If key is not found, d is returned if given, otherwise KeyError is raised
 |  
 |  popitem(...)
 |      D.popitem() -> (k, v), remove and return some (key, value) pair as a
 |      2-tuple; but raise KeyError if D is empty.
 |  
 |  setdefault(...)
 |      D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
 |  
 |  update(...)
 |      D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
 |      If E is present and has a .keys() method, then does:  for k in E: D[k] = E[k]
 |      If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
 |      In either case, this is followed by: for k in F:  D[k] = F[k]
 |  
 |  values(...)
 |      D.values() -> an object providing a view on D's values
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __hash__ = None

Iterators and Generators

  • Generators are iterators: they are implemented using the yield keyword (that yields partial results of a function)
  • Iterators are iterables: they implement the next() and iter() methods
In [9]:
# Example of a generator

def foo():
    for ii in range(4):
        yield ii # this gives out a partial result.

t = foo()
print(t) # --> Shows that the return value is a generator object
import collections; isinstance(t, collections.Iterator) # --> This is an iterator object
# (OOP concepts in python will be covered later...)

for partial_result in t: # Iterating thru' a generator
    print(partial_result)

t = foo()

#Another way to iterate thru' a generator
while True:
    try:
        print(next(t))
    except StopIteration:
        break
        
# generator expressions
t = (x*x for x in range(10))
print(t) # --> t is a generator!
0
1
2
3
0
1
2
3
 at 0x7f092cb78e60>

Closures

enables us to abstract out the state of a function.

a closure is formed when the following three conditions are satisfied

  • there must be nested functions
  • the nested function must use the variables defined in the enclosing function
  • the enclosing function should return the nested function
In [11]:
def thresholder(n=10):
    def thresholding_function(lst):
        return list(filter(lambda x: x > n, lst))
    return thresholding_function

x = [1, 2, 3, 4, 5, 6, 11, 12, 13]
thresholder_10 = thresholder(10)
print(thresholder_10(x))
thresholder_5 = thresholder(5)
print(thresholder_5(x))
[11, 12, 13]
[6, 11, 12, 13]

Decorators

Decorators are a syntactic convenience that allow us to define what needs to be done to the output of a function before the function is called.

In [17]:
# Example of an in-built decorator
class Foo:
    
    @property # --> Foo.state is equivalent to property(state)
    def state(self):
        return True

foo = Foo()
print(foo.state)

# Example of a custom decorator
import time
def timer(func):
    def time_func(*args, **kwargs):
        start_time = time.time()
        func(*args, **kwargs)
        print("Function '%s' took %3.4f seconds." %(func.__name__, time.time() - start_time))
    return time_func

@timer
def add(x, y):
    return x + y

add(2, 3)
True
Function 'add' took 0.0000 seconds.

Collections

Python offers an inbuilt library called collections that has several useful datastructures like: namedtuple, defaultDict, OrderedDict, deque, and Counter.
A few basic examples are given below…

Documentation https://docs.python.org/3/library/collections.html

In [10]:
from collections import defaultdict

x = defaultdict(int) # an element that does not exist in the dictionary (hashmap) 
                     # will be assumed to be a 0 (Since int() returns 0)

print(x['test'])
x['abc'] = 2
print(x)

from collections import OrderedDict

x = OrderedDict() # Stores items in the order of insertion
x['a'] = 1
x['b'] = 2

print(x)

from collections import Counter
x = Counter() # A counter
x['a'] = 10
x['b'] = 20
print(x)
print(x.most_common(1))
print(x - x)
print(x + x)

# Deque is left as an exercise
0
defaultdict(<class 'int'>, {'test': 0, 'abc': 2})
OrderedDict([('a', 1), ('b', 2)])
Counter({'b': 20, 'a': 10})
[('b', 20)]
Counter()
Counter({'b': 40, 'a': 20})

Packaged Utilities

Itertools

Efficient set of functions for various constructs inspiried from other languages…

Documentation: https://docs.python.org/3/library/itertools.html

In [11]:
from itertools import count, cycle, repeat, accumulate, groupby

def exec_func(func, x=10):
    c = 0
    for ii in func(x):
        print(ii)
        c += 1
        if c == 5:
            print("Break")
            break

# count
print("Count")
exec_func(count, 10)

#cycle
print("Cycle")
exec_func(cycle, "ABC")

#repeat
print("Repeat")
exec_func(repeat, 10)

# accumulate
print("Accumulate")
for entry in accumulate(range(0,10)):
    print(entry)

# Groupby
print("Groupby")
x = [1,1,2,2,3,3,3,3,5,5,5,5,5,5,1,1,1,3,5,1,1,3,5,5,2]
fd = [[a, len(list(b))] for a, b in groupby(sorted(x))] # Computing the frequency distribution
print(fd)
Count
10
11
12
13
14
Break
Cycle
A
B
C
A
B
Break
Repeat
10
10
10
10
10
Break
Accumulate
0
1
3
6
10
15
21
28
36
45
Groupby
[[1, 7], [2, 3], [3, 6], [5, 9]]

A bit of OOP

Python supports object-oriented programming

Defining object oriented concepts

In [24]:
# defining a class

class Foo:
    
    svar = 25 # static variable
    
    def __init__(self, x): # Constructor; self refers to the instance
        self.x = x # object variable
        self.__x = x # private variable
    
    @property
    def state(self):
        return self.x

    def add(self, a, b): # method
        return a + b
    
    def __private_method(self, a): # private method (mostly syntactic sugar)
        print(a)
    
    @staticmethod
    def t():
        print("This is a static method")

foo = Foo(5) # instantiating a class
print(foo.state)
Foo.t() # calling a static method
print(Foo.svar) # calling static variable
5
This is a static method
25

Inheritence, polymorphism, encapsulation..

  • General philosophy is that data is strictly not hidden, but there is a convention of using “_” or “__” to mark private variables.
  • Polymorphism is achieved through the ability to accept arbitrary (and keyword) arguments
In [31]:
# Example of inheritence

class Foo:
    def __init__(self, x):
        self.x = x
    
    @property
    def state(self):
        return self.x

    def method(self):
        print("This is a Foo method")
    
class Bar(Foo): # Bar inherits Foo
    def __init__(self):
        Foo.__init__(self, 10)

    def method(self): # Overriding
        print("This is a Bar method")
        super(Bar, self).method() # Calling method of super class
        
bar = Bar()
print(bar.state)
bar.method()
10
This is a Bar method
This is a Foo method

Using external libraries

Python comes with a large set of community managed libraries.
To use them, you can use one of the existing “package managers” like easy_install or pip (python-in-python).

First, you have to install the package manager; in a debian-based system, it amounts to:

sudo apt-get install python3-pip

or

sudo apt-get install python-setuptools

After that, you can install a package of your choice using:

pip3 install

example: pip3 install web.py to install web.py – which is a simple web server library for python

or easy_install

Here are a few popular libraries:

  • Data Analysis: numpy, scipy, pandas, jupyter-notebook
  • Web Development: tornado, gunicorn, flask, web.py, web2py, django
  • Mobile Development: kivy
  • Desktop Application Development: pyqt, pygtk
  • Machine Learning: sklearn, sklearn-image
  • NLP: nltk, spacy
  • DevOps: fabric

There are many, many more…!

Application Development Examples

Web Service

An example is given below using bottle.py

Documentation: http://bottlepy.org/docs/dev/index.html

In [37]:
! pip3 install -U bottle

from bottle import route, run, template

@route('/hello/')
def index(name):
    return template('Hello {{name}}!', name=name)

run(host='localhost', port=9000)
Collecting bottle
Installing collected packages: bottle
Successfully installed bottle-0.12.13
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Bottle v0.12.13 server starting up (using WSGIRefServer())...
Listening on http://localhost:9000/
Hit Ctrl-C to quit.

Data crunching with numpy and pandas

numpy and pandas have to be separately installed using pip/easy_install

Read numpy documentation at: http://www.numpy.org/
Read pandas documentation at: http://pandas.pydata.org/

The functionalities are too large to fall within the scope of the current discussion.

For completeness and to give you a taste of how it is to use these libraries, a few (very basic) examples are given below…

In [59]:
# Examples of numpy
import numpy as np

x = np.array([1, 2, 3]) # one of the basic data types in numpy as an np-array
y = np.array([2, 3, 4])
print(x.dot(y)) # dot product of two vectors
y = np.array([[5, 6, 7], [3, 4, 5]])
print(np.matmul(x, y.T)) # matrix multiplication
z = np.array([[1, 2, 3], [1, 5, 6], [7, 6, 9]])
print(np.linalg.inv(z)) # matrix inverse

# Examples of pandas
import pandas as pd
x = pd.DataFrame(z) # one of the basic data types in pandas is a DataFrame
print(x)
x.iloc[1] # indexing 2nd row
20
[38 26]
[[ -7.50000000e-01  -1.26882631e-16   2.50000000e-01]
 [ -2.75000000e+00   1.00000000e+00   2.50000000e-01]
 [  2.41666667e+00  -6.66666667e-01  -2.50000000e-01]]
   0  1  2
0  1  2  3
1  1  5  6
2  7  6  9
Out[59]:
0    1
1    5
2    6
Name: 1, dtype: int64
Posted in Programming, SpeechActivatedChatBotWorkshop | Tagged , | 3 Comments