Lab 8: Symbolic Algebra
If you are a current student, please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.
Table of Contents
 1) Preparation
 2) Introduction
 3) Basic Symbols
 4) Binary Operations
 5) Display
 6) Using Python Operators with Symbolic Expressions
 7) Derivatives
 8) Simplification
 9) Evaluation
 10) Parsing Symbolic Expressions
 11) Code Submission
 12) Checkoff
 13) Additional (Optional) Extensions
1) Preparation§
You may wish to read the whole lab before implementing any code, so that you can make a more complete plan before diving in. Try making a plan for your code, including the following:
 Which classes are you going to implement? Which classes are subclasses of which classes?
 What attributes are stored in each class?
 What methods does each class have?
As you work through, be on the lookout for opportunities to take advantage of inheritance to avoid repetitious code!
Throughout the lab, you should not use Python's eval
,
exec
, isinstance
, or type
builtins, nor should you be explicitly
checking for the type of any particular object, except where we have indicated
that it is okay to do so. Instead of explicitly checking the type of our
objects and determining our behavior based on that result, our goal is to
use Python's own mechanism of method/attribute lookup to implement these
different behaviors without the need to do any type checking of our own.
This lab assumes you have Python 3.6 or later installed on your machine (3.9 recommended).
The following file contains code and other resources as a starting point for this lab: lab8.zip
Most of your changes should be made to lab.py
, which you will submit
at the end of this lab. Importantly, you should not add any imports
to the file.
You can also see and participate in online discussion about this lab in the "Lab 8" Category in the forum.
This lab is worth a total of 4 points. Your score for the lab is based on:
 passing the test cases from
test.py
under the time limit (1.5 points), and  a brief "checkoff" conversation with a staff member to discuss your code (2.5 points).
For this lab, you will only receive credit for a test case if it runs to completion under the time limit on the server.
Please also review the collaboration policy before continuing.
This lab does not have any questions that are due before lecture, but you are strongly encouraged to read the specification, and perhaps to get started with trying to implement some of the code, earlier than that. The questions below are all due on Friday, 23 April at 5pm Eastern.
2) Introduction§
In this lab, we will develop a Python framework for symbolic algebra. In such a system, algebraic expressions including variables and numbers are not immediately evaluated but rather are stored in symbolic form.
We'll start by implementing support for basic arithmetic (+
, 
, *
, and
/
) on variables and numbers, and then we'll add support for simplification
and differentiation of these symbolic expressions. Ultimately, this system
will be able to support interactions such as the following:
>>> x = Var('x')
>>> y = Var('y')
>>> print(x + y)
x + y
>>> z = x + 2*x*y + x
>>> print(z)
x + 2 * x * y + x
>>> print(z.deriv('x')) # derivative of z with respect to x
1 + 2 * x * 0 + y * (2 * 1 + x * 0) + 1
>>> print(z.deriv('x').simplify())
1 + y * 2 + 1
>>> print(z.deriv('y')) # derivative of z with respect to y
0 + 2 * x * 1 + y * (2 * 0 + x * 0) + 0
>>> print(z.deriv('y').simplify())
2 * x
>>> z.eval({'x': 3, 'y': 7}) # evaluate an expression with particular values for each variable
48
3) Basic Symbols§
In this week's code distribution, we have provided a very minimal skeleton, containing a small definition for three classes:
Symbol
will be our base class; all other classes we create will inherit from this class, and any behavior that is common between all expressions (that is, all behavior that is not unique to a particular kind of symbolic expression) should be implemented here. Instances of
Var
represent variables (such as x or y).  Instances of
Num
represent numbers within symbolic expressions.
Take a look at the Var
and Num
classes. Note that each has __init__
,
__repr__
, and __str__
defined for you already. Importantly, while you
are welcome to add additional machinery to these objects if you want to, it
will be important that the existing instance variables continue to work as
they do in the provided skeletons (our test suite relies on this behavior).
4) Binary Operations§
So far, our system is somewhat uninteresting. It lets us represent variables
and numbers, but in order to be able to represent meaningful expressions, we
also need ways of combining these primitive symbols together. In particular,
we will represent these kinds of combinations as binary operations. You
should implement a class called BinOp
to represent a binary operation.
Because it is a type of symbolic expression, BinOp
should be a subclass of
Symbol
.
We will implement four subclasses of BinOp
:
Add
, to represent an additionSub
, to represent a subtractionMul
, to represent a multiplicationDiv
, to represent a division
By virtue of being binary operations, each instance of any of these classes should have two instance variables:
left
: aSymbol
instance representing the lefthand operandright
: aSymbol
instance representing the righthand operand
For example, Add(Add(Var('x'), Num(3)), Num(2))
represents the symbolic
expression x + 3 + 2. The left
attribute of this instance should be the
Add
instance variable Add(Var('x'), Num(3))
, and the right
attribute
should be the Num
instance Num(2)
.
Importantly, instances of BinOp
(or subclasses thereof) should only have
these two instance attributes. You are welcome to store additional
information in class attributes, but each instance should only store left
and
right
.
The structure of each of these classes' __init__
methods is likely to be
almost the same (if not exactly the same). What does that suggest about
how/where you should implement __init__
?
These constructors should also accept integers, floats, or strings as their arguments.
Add(2, 'x')
, for example, should create an instance Add(Num(2), Var('x'))
.
It is okay to use isinstance
or type
in this context, to check if the
arguments passed to the constructor are strings or numbers.
5) Display§
As of right now, attempting to print an instance of BinOp
(or a subclass
thereof) will not really tell us much useful information. In this section,
we'll improve on Python's ability to display these objects.
Python has two different ways to get a representation of an object as
a string. First, repr(obj)
(which calls obj.__repr__()
underthehood) should
produce a string containing a representation that, when passed back into
Python, would evaluate to an equivalent object. Second, str(obj)
(which calls
obj.__str__()
underthehood) should produce a humanreadable string
representation.
Take a look at the __repr__
and __str__
methods in Var
and Num
. What
is the difference between them?
Implement __repr__
and __str__
in appropriate places (avoiding repeating
code where possible), such that, for any symbolic expression sym
, repr(sym)
will produce a string containing a Python expression that, when evaluated,
will result in a symbolic expression equivalent to sym
. Similarly, str(sym)
should produce a humanreadable representation, given by
left_string + ' ' + operand + ' ' + right_string
, where left_string
and right_string
are the
string representations of the left
and right
attributes, respectively, and
operand
is a string representation of the operand associated with the
specific class in question. For example:
>>> z = Add(Var('x'), Sub(Var('y'), Num(2)))
>>> repr(z) # notice that this result, if fed back into Python, produces an equivalent object.
"Add(Var('x'), Sub(Var('y'), Num(2)))"
>>> str(z) # this result cannot necessarily be fed back into Python, but it looks nicer.
'x + y  2'
As always, try to avoid repetitious code when implementing! You may wish to
store additional class variables in the subclasses of BinOp
in order to
accomplish this. For example, we need a way to look up a string representing
the associated operator for each of the supported operations.
5.1) Parenthesization§
Note that, while a __repr__
implementation that follows the rules described
above works well for complicated expressions (as seen in the expressions
in the last example above), a __str__
implementation that
follows those rules results in some possible ambiguities or erroneous
interpretations. In particular, consider the expression
Mul(Var('x'), Add(Var('y'), Var('z')))
. According to the rules above for
__str__
, that
expression's string representation would be "x * y + z"
, but the internal
structure of the expression suggests something different! It would be nice for
the string representation instead to be "x * (y + z)"
, to better align with the
actual structure represented by the expression.
To address this, we add rules for parenthesization as follows, where B
is the BinOp
instance whose string representation we are finding^{1}:
 If
B.left
and/orB.right
themselves represent expressions with lower precedence thanB
, wrap their string representations in parentheses (here, precedence is defined using the standard "PEMDAS" ordering).  As a special case, if
B
represents a subtraction or a division andB.right
represents an expression with the same precedence asB
, wrapB.right
's string representation in parentheses.
Individual numbers or variables should never be wrapped in parentheses.
Think about the rules for parenthesization described above in terms of algebraic expressions and work through parenthesizing some example expressions by hand to get a feel for how these rules work. Do these rules seem to work in a general sense  will they always work across different operations and across different levels of expression complexity? Why do they work? Why are subtraction and division treated differently from addition and multiplication?
Importantly, you should implement this behavior without explicitly checking
the type of self
, self.left
, or self.right
.
Doing so will require having a way to get information about an instance's precedence. How can you store this information? And where should it be stored?
6) Using Python Operators with Symbolic Expressions§
Entering expressions of the form Add(Var('x'), Add(Num(3), Num(2)))
can get a
little bit tedious. It would be much nicer, for example, to be able to enter
that expression as Var('x') + Num(3) + Num(2)
.
Add methods to appropriate class(es) in your file such that, for any
arbitrary symbolic expressions E1
and E2
:
E1 + E2
results in an instanceAdd(E1, E2)
(note: you can override the behavior of+
with the__add__
and__radd__
"dunder" methods)E1  E2
results in an instanceSub(E1, E2)
(note: you can override the behavior of
with the__sub__
and__rsub__
"dunder" methods)E1 * E2
results in an instanceMul(E1, E2)
(note: you can override the behavior of*
with the__mul__
and__rmul__
"dunder" methods)E1 / E2
results in an instanceDiv(E1, E2)
(note: you can override the behavior of/
with the__truediv__
and__rtruediv__
"dunder" methods)
Try to avoid duplicating code when implementing these behaviors! In what
class(es) should you implement __add__
?
These behaviors should work so long as at least one of E1
and E2
is a
symbolic expression and the other is either a symbolic expression, a number,
or a string.
For example:
>>> Var('a') * Var('b')
Mul(Var('a'), Var('b'))
>>> 2 + Var('x')
Add(Num(2), Var('x'))
>>> Num(3) / 2
Div(Num(3), Num(2))
>>> Num(3) + 'x'
Add(Num(3), Var('x'))
We have seen __add__
, __sub__
, __mul__
, and __truediv__
before, but
what about the versions with the r
on the front of them? They represent
the same operations, but with the operands swapped.
It turns out that Python has a small set of rules for deciding which of
these methods to call in various situations. For example, consider the small
expression x + y
. In order to decide what to do here, Python will:
 If
x
is not a builtin type, and it has an__add__
method, invokex.__add__(y)
.  Otherwise, if
x
andy
are of different types, and ify
has an__radd__
method, invokey.__radd__(x)
.
In the context of our program, implementing both __add__
and __radd__
will allow both of the following expressions to work:
>>> x = Var('x')
>>> x + 2 # here, __add__ will be called
Add(Var('x'), Num(2))
>>> 2 + x # here, __radd__ will be called
Add(Num(2), Var(x))
Feel free to ask us if you have questions about this! The page of the Python documentation about the Python Data Model has more information about these methods as well.
7) Derivatives§
Next, we'll make the computer do our 18.01 homework for us. Well, not quite. But we'll implement support for symbolic differentiation. In particular, we would like to implement the following rules for partial derivatives (where x is an arbitrary variable, c is a constant or a variable other than x, and u and v are arbitrary expressions):
Even though it may not be obvious from looking at first glance, these mathematical definitions are recursive! That is to say, partial derivatives of compound expressions are defined in terms of partial derivatives of component parts, which may suggest strategies for implementing these rules in your program!
Implement differentiation by adding a method called deriv
to your classes.
This method should take a single argument (a string containing the name of the
variable with respect to which we are differentiating), and it should return a
symbolic expression representing the result of the differentiation. For example:
>>> x = Var('x')
>>> y = Var('y')
>>> z = 2*x  x*y + 3*y
>>> print(z.deriv('x')) # unsimplified, but the following gives us (2  y)
2 * 1 + x * 0  (x * 0 + y * 1) + 3 * 0 + y * 0
>>> print(z.deriv('y')) # unsimplified, but the following gives us (x + 3)
2 * 0 + x * 0  (x * 1 + y * 0) + 3 * 1 + y * 0
Importantly, we would like computing any derivative to result in a symbolic expression. Make sure that your code has this property!
8) Simplification§
The above code works, but it leads to output that is not very readable (it is
very difficult to see, for example, that the above examples correspond to 2y
and x+3, respectively). To help with this, implement a method called
simplify
to your class(es). simplify
should take no arguments, and it
should return a simplified form of the expression, according to the following
rules:
 Any binary operation on two numbers should simplify to a single number containing the result.
 Adding 0 to (or subtracting 0 from) any expression E should simplify to E.
 Multiplying or dividing any expression E by 1 should simplify to E.
 Multiplying any expression E by 0 should simplify to 0.
 Dividing 0 by any expression E should simplify to 0.
 A single number or variable always simplifies to itself.
Think about the simplification rules described above in terms of algebraic expressions and work through simplifying some example expressions by hand to get a feel for how these rules work. Do these rules seem to work in a general sense  will they always work across our different operations and across different levels of expression complexity, producing sensible simplifications? Why?
Within simplify
, you are welcome to use isinstance
or type
to check
whether both subexpressions of a binary operation are instances of Num
,
but you should not do any other explicit type checking.
Your simplification method does not need to handle cases like 3 + (x + 2), where the terms that could be combined are separated from each other in the tree. In fact, our test cases are expecting only the above rules to be implemented. Implementing extra rules here can be a great opportunity for extra practice, but it will cause some test cases to fail (so maybe best to implement such things only after submitting!).
For example, continuing from above:
>>> z = 2*x  x*y + 3*y
>>> print(z.simplify())
2 * x  x * y + 3 * y
>>> print(z.deriv('x'))
2 * 1 + x * 0  (x * 0 + y * 1) + 3 * 0 + y * 0
>>> print(z.deriv('x').simplify())
2  y
>>> print(z.deriv('y'))
2 * 0 + x * 0  (x * 1 + y * 0) + 3 * 1 + y * 0
>>> print(z.deriv('y').simplify())
0  x + 3
>>> Add(Add(Num(2), Num(2)), Add(Var('x'), Num(0))).simplify()
Var('x')
Note, also, that your simplify
method should work by creating a new instance
to represent the simplified form, rather than mutating the instance from which
simplify
was called.
And, importantly, we would like the result of any simplification to result in a symbolic expression. Make sure that your code has this property!
9) Evaluation§
Next, we'll add support for evaluating expressions for particular values of
variables. Add method(s) to your class(es) such that, for any symbolic
expression sym
, sym.eval(mapping)
will find a numerical value for the given
expression. mapping
should be a dictionary mapping variable names to values.
For example:
>>> z = Add(Var('x'), Sub(Var('y'), Mul(Var('z'), Num(2))))
>>> z.eval({'x': 7, 'y': 3, 'z': 9})
8
>>> z.eval({'x': 3, 'y': 10, 'z': 2})
9
In the test cases we'll run, the given dictionary will always contain all of the bindings needed to evaluate the expression fully.
What should your code do in the case where the expression contains a variable
not present in the given mapping
dictionary?
Think through some of the different options here, make a decision about how you want your program to behave in that situation, and implement it. Be prepared to discuss your decision, as well as your code for implementing it, during the checkoff.
10) Parsing Symbolic Expressions§
Finally, we would like to support parsing strings into symbolic expressions (to provide yet another means of input). For example, we would like to do something like:
>>> sym('(x * (2 + 3))')
Mul(Var('x'), Add(Num(2), Num(3)))
Define a standalone function called sym
, which takes a single string as input. This string should contain either:
 a single variable name,
 a single number, or
 a fully parenthesized expression of the form
(E1 op E2)
, representing a binary operation (whereE1
andE2
are themselves strings representing expressions, andop
is one of+
,
,*
, or/
).
You may assume that the string is always wellformed and fully parenthesized (you do not need to handle erroneous input), but it should work for arbitrarily deep nesting of expressions.
This process is often broken down into two pieces: tokenizing (to break the input string into meaningful units) and parsing (to build our internal representation from those units).
10.1) Tokenizing§
A good helper function to write is tokenize
, which should take a string as
described above as input and should output a list of meaningful tokens
(parentheses, variable names, numbers, or operands).
For our purposes, you may assume that variables are always singlecharacter alphabetic characters, that all numbers are integers, and you may also assume that there are spaces separating operands and operators.
As an example:
>>> tokenize("(x * (2 + 3))")
['(', 'x', '*', '(', '2', '+', '3', ')', ')']
Note that your code should also be able to handle numbers with more than one
digit and negative numbers! A number like 200, for example, should be
represented by a single token '200'
.
10.2) Parsing§
Another helper function, parse
, could take the output of tokenize
and
convert it into an appropriate instance of Symbol
(or some subclass thereof).
For example:
>>> tokens = tokenize("(x * (2 + 3))")
>>> parse(tokens)
Mul(Var('x'), Add(Num(2), Num(3)))
Our "little language" for representing symbolic expressions can be parsed using
a recursivedescent parser. One way to structure parse
is as follows:
def parse(tokens):
def parse_expression(index):
pass # your code here
parsed_expression, next_index = parse_expression(0)
return parsed_expression
The function parse_expression
is a recursive function that takes as an argument an integer into the tokens
list and returns a pair of values:
 the expression found starting at the location given by
index
(an instance of one of theSymbol
subclasses), and  the index beyond where this expression ends (i.e., if the expression ends at the token with index
6
in thetokens
list, then the returned value should be7
.
In the definition of this procedure, we make sure that we call it with the value index
corresponding to the start of an expression. So, we need to handle three cases. Let token
be the token at location index
; the cases are:

Number: If
token
represents an integer, then make a correspondingNum
instance and return that, paired withindex + 1
(since a number is represented by a single token). 
Variable: If
token
represents a variable name (a single alphabetic character), then make a correspondingVar
instance and return that, paired withindex + 1
(since a variable is represented by a single token). 
Operation: Otherwise, the sequence of tokens starting at
index
must be of the form:(E1 op E2)
. Therefore,token
must be(
. In this case, we need to recursively parse the two subexpressions, combine them into an appropriate instance of a subclass ofBinOp
(determined byop
), and return that instance, along with the index of the token beyond the final right parenthesis.
Implement the sym
function in your code (possibly using the helper functions
described above). Your implementation of the function sym
should not use
Python's builtin eval
, exec
, type
, or isinstance
functions.
11) Code Submission§
12) Checkoff§
Once you are finished with the code, please come to office hours and add yourself to the queue asking for a checkoff. You must be ready to discuss your code and test cases in detail before asking for a checkoff.
You should be prepared to demonstrate your code (which should be wellcommented, should avoid repetition, and should make good use of helper functions). In particular, be prepared to discuss:
 Your implementation of
deriv
, including using inheritance to avoid repetitious code.  A brief explanation of the rules for parenthesization and simplification and why those rules work.
 Your implementation of parenthesization, including using inheritance to avoid repetitious code.
 Your implementation of
simplify
, including using inheritance to avoid repetitious code.  Your implementation of
eval
, including using inheritance to avoid repetitious code.  Your implementation of
sym
.  A demonstration of using your code to simulate some symbolicalgebra problems of your own choosing.
12.1) Grade§
13) Additional (Optional) Extensions§
We think the program we've built here is pretty impressive! But there are many opportunities to improve and/or extend its behavior to do even more impressive things. If you have the time and interest, there are a number of ways that you could expand on this framework, including (but not limited to):

introducing trig functions like \sin and \cos into your framework, including support for derivatives, simplification, etc

introducing exponentiation into your framework, including support for derivatives, simplification, etc

adding additional rules for simplification similar to those we've seen above (for example, for any expression u, u/u could simplify to 1).

modifying your code to be able to handle situations like 3 + (x+2), where simplification is possible in theory, but where the methods described above won't fully simplify the expression

adding more simplification rules relating addition to multiplication, or multiplication to exponentiation. For example, could we simplify x + x + x to 3x, or (3y)(4y) to 12y^2 ?

adding representations for polynomial expressions, which may provide additional opportunities for simplification
Footnotes
^{1}Note that these rules should be sufficient for the four operations we have implemented. If we wanted to add additional operations, we may need to reconsider these rules.