IP University - POPL - Unit 3 Notes
This post gives a quick review on the third unit of the syllabus for the Principles of Progamming Languages (POPL) subject in the IP University syllabus.
Storage Management
Static and Dynamic control
Stack vs Heap
Stack
- follows a LIFO (Last in first out) strategy
- block in memory is of fixed size
- last entry in the stack is accessible at any moment
- contiguous memory with pointers named as stack base, top of the stack
- support function calls (with stack frames / activation record)
Heap
- does not follow any definite approach
- allows random assignment and deassignment of the memory
- process accesses the allocated memory area through the pointer
- deallocation is carried through the deallocation request
- develops holes in the memory allocation when data structures are built and freed
- used at the runtime.
Differences
- In a stack, the allocation and deallocation is done by CPU and is automatic whereas, in heap, it needs to be done by the programmer manually.
- Heap frame handling is costlier than stack frame handling.
- Implementation of a stack is complex. As against, implementation of a heap is simple.
- A function call in stack takes O(N) time. In contrast, it takes O(1) time in a heap.
- Stack implementation mainly suffers from the memory shortage problem. On the contrary, the main issue in a heap is fragmentation.
- Access to a stack frame is easier than the heap as the stack is confined to the small region of memory and it always hit the cache, but heap frames are dispersed throughout the memory so the memory accessing can cause more cache misses.
- Stack is not flexible, the memory size allotted cannot be changed. On the other hand, a heap is flexible, and the allotted memory can be altered.
- A heap takes more accessing time than a stack.
Sequence Control
Control structure in a PL provides the basic framework within which operations and data are combined into a program and sets of programs
- Sequence Control -> Control of the order of execution of the operations
- Data Control -> Control of transmission of data among subprograms of program
Sequence Control may be categorized into four groups:
- Expressions – They form the building blocks for statements
- Statements – The statements (conditional & iterative) determine how control flows from one part of program to another
- Declarative Programming – This is an execution model of program which is independent of the program statements
- Subprograms – In structured programming, program is divided into small sections and each section is called subprogram
2 types:
- Implicit SC - those defined by the programming language itself. These structures can be modified explicitly by the programmer
- Explicit SC - those that programmer may optionally use to modify the implicit sequence of operations defined by the language. eg. (parentheses within expressions, or goto statements and labels)
Sequence Control within Expressions
- Arithmetic Expressions
- evaluated from left to right and using the rules of precedence of operators
- if expression involves parentheses, the expression inside parentheses is evaluated first
- Relational Expressions
- The relational operators <, >, <=, >= are given the first priority
- operators (== and != ) are given the second priority
- arithmetic operators have higher priority over relational operators
- resulting expression will be of integer type, true = 1, false = 0
- Logical Expressions
- result of a logical expression is either true or false
- expression involving NOT is evaluated first
- then expression with AND
- finally the expression having OR is evaluated
Concepts:
- Controlling the evaluation of expressions
- Precedence - the operator at higher level of precedence is evaluated first
- Associativity - operators of the same precedence are evaluated either from left to right or from right to left depending on the level
- Expression tree
- Syntax for expressions
- Prefix or Polish notation
- Postfix or reverse polish
- Infix notation
- Execution-Time representation Translators evaluate the expression using a method so as to get efficient result (optimum value at optimum time with optimum use of memory and processor) Two phases:
- the basic tree control structure for expression is established
- stage whole evaluation process takes place Following methods are used for translation of expression:
- Machine code sequences
- Tree structure
- Prefix or postfix form
Problems with Evaluation of Expressions:
- Uniform Evaluation Code
- Eager Evaluation Rule
- for each operation node, first evaluate each of the operands, then apply the operation to the evaluated operands
- Lazy Evaluation Rule
- never evaluate operands before applying the operation
- pass the operands unevaluated and let the operation decide if evaluation is needed
- impractical to implement the same in many cases as it requires substantial software simulation
- Eager Evaluation Rule
- Side effects
c / func(y) + c
- if fun(y) has the side effect of modifying the value of c, the order of evaluation is critical
- Short-circuit Boolean Expression
- the left expression in the boolean expression is evaluated first and second expression is evaluated only when needed
Sequential Control within Statement
The kinds of control flow statements supported by different languages vary, but can be categorized by their effect:
- Continuation at a different statement (unconditional branch or jump)
- Executing a set of statements only if some condition is met (choice - i.e., conditional branch)
- Executing a set of statements zero or more times, until some condition is met (i.e., loop - the same as conditional branch)
- Executing a set of distant statements, after which the flow of control usually returns (subroutines, coroutines, and continuations)
- Stopping the program, preventing any further execution (unconditional halt)
Sequential Control Statements:
- Basic statements
- assignment statement
- input and output statements
- declaration
- goto
- break, continue
- Statement level sequence control
- implicit sequence control
- composition type - standard form of implicit sequence, statements placed in order of execution
- alternation type - there are two alternate statement sequence in the program, the program chooses any of the sequence but not both at same time
- iteration type - normal sequence is given to statements but the sequence repeats itself for more than one time
- explicit sequence control goto, break etc.
- implicit sequence control
- Structured sequence control
- compound statement
- conditional statement
- iteration statements
Subprogram Control
Suprogram sequence control
- how one subprogram invokes another and called subprogram returns to the first.
Simple Call-Return Subprograms
- program is composed of single main program
- calls various subprograms which may call other subprograms and so on to any depth
- each subprogram returns the control to the program/subprogram after execution
- execution of calling program is temporarily stopped during execution of the subprogram
- after the subprogram is completed, execution of the calling program resumes at the point immediately following the call
- Copy Rule - effect of the call statement is the same as would be if the call statement is replaced by body of subprogram (with suitable substitution of parameters)
Assumptions:
- Subprogram can not be recursive
- Explicit call statements are required
- Subprograms must execute completely at call
- Immediate transfer of control at point of call or return
- Single execution sequence for each subprogram
Implementation:
- Difference:
- Subprogram definition – The written program which is translated into a template
- Subprogram activation – Created each time a subprogram is called using the template created from the definition
- An activation is implemented as two parts
- Code Segment – contains executable code and constants
- Activation record – contains local data, parameters & other data items
- The code segment is invariant during execution. It is created by translator and stored statically in memory. They are never modified. Each activation uses the same code segment.
- A new activation record is created each time the subprogram is called and is destroyed when the subprogram returns. The contents keep on changing while subprogram is executing
- Current Instruction Pointer (CIP)
- pointer which points to the instruction in the code segment that is currently being executed (or just about to be) by the hardware or software interpreter.
- Current Environment Pointer (CEP)
- activation record contains its set of local variables
- it represents the “referencing environment” of the subprogram
- pointer to current activation record is CEP
- Execution of Program
- activation for the main program is created and CEP is assigned to it
- CIP is assigned to a pointer to the first instruction of the code segment for the subprogram
- when a subprogram is called, new assignments are set to the CIP and CEP for the first instruction of the code segment of the subprogram and the activation of the subprogram
- To return correctly from the subprogram, values of CEP and CIP are stored before calling the subprogram
- when return instruction is reached, it terminates the activation of subprogram, the old values of CEP and CIP that were saved at the time of subprogram call are retrieved and reinstated
- Recursive Subprograms
- In Recursive subprogram calls A subprogram may call any other subprogram including A itself, a subprogram B that calls A or so on
- difference between a recursive call and an ordinary call is that the recursive call creates a second activation of the subprogram during the lifetime of the first activation.
- If execution of program results in chain such that \(k\) recursive calls of subprogram occur before any return is made. Thus \(k+1\) activation of subprogram exist before the return from \(k^{th}\) recursive call
- both CIP and CEP are used to implement recursive subprogram
Data Control & Referencing Environments
The main elements of data control are referencing environments, visibility, dynamic scope, referencing operations, and reference locality.
Referencing environments may be local, nonlocal, global, or predefined. The reference environment of a statement is the collection of all names that are visible in the statement. Referencing environments are generally invariant during program execution, i.e., values may change but name associations do not.
- local : associations created on entry to that subprogram (formal parameters, local variables, etc)
- nonlocal : associations not created on entry to that subprogram
- global : associations created at the start of execution of that main program
- predefined : defined in the language definition
Visibility refers to the associations a subprogram has access to during its execution. Redefining a variable within a subprogram “hides” the original association during the execution of the subprogram.
There are 2 ways through which a data object can be made available as an operand for an operation:
- direct transmission : object computed at one time during an operation may be transmitted directly to another operation as an operand
- referencing through a named data object: named when created, and that name may be used to refer it in an operand for an operation
Static and Dynamic Scoping
In static scope, if a variable name’s scope is a certain function, then its scope is the program text of the function definition: within that text, the variable name exists, and is bound to the variable’s value, but outside that text, the variable name does not exist. A block defines a new scope. Variables can be declared in that scope, and aren’t visible from the outside. However, variables outside the scope – in enclosing scopes – are visible unless they are overridden.
- (Algol, Ada, C, Pascal, Scheme, and Haskell)
In dynamic scope (or dynamic scoping), if a variable name’s scope is a certain function, then its scope is the time-period during which the function is executing: while the function is running, the variable name exists, and is bound to its value, but after the function returns, the variable name does not exist. Using this scoping rule, we first look for a local definition of a variable. If it isn’t found, we look up the calling stack for a definition.
- (Lisp, older interpreted languages such as SNOBOL and APL)
This means that if function f
invokes a separately defined function g
, then under lexical scope, function g
does not have access to f
’s local variables (assuming the text of g
is not inside the text of f
), while under dynamic scope, function g
does have access to f
’s local variables (since g
is invoked during the invocation of f
).
Parameter Passing
The formal parameters are the parameters associated with the function, and the actual parameters are the parameters that are used to call the function.
Methods:
- call by value: copy going into the procedure. This is the mechanism used by both C and Java. Note that this mechanism is used for passing objects, where a reference to the objected is passed by value
- call by result: copy going out of the procedure, the formal parameters are copied back into the actual parameters. (Note that this only makes sense if the actual parameter is a variable, or has a l-value, such as an array element.)
- call by value result: copy going in, and again going out
- call by reference: The actual parameters and formal parameters are identified. The natural mechanism for this is to pass a pointer to the actual parameter, and indirect through the pointer
- call by name: re-evaluate the actual parameter on every use. For actual parameters that are simple variables, this is the same as call by reference. For actual parameters that are expressions, the expression is re-evaluated on each access
Block Structure
- a block or code block is a lexical structure of source code which is grouped together
- consist of one or more declarations and statements
- fundamental to structured programming, where control structures are formed from blocks
- In a block-structured programming language, the objects named in outer blocks are visible inside inner blocks, unless they are masked by an object declared with the same name.
The semantic meaning of a block is twofold.
- it provides the programmer with a way for creating arbitrarily large and complex structures that can be treated as units
- it enables the programmer to limit the scope of variables and sometimes other objects that have been declared