Intro To Programming In Science

Setup

Octave (aka free matlab)
https://www.gnu.org/software/octave/download

R studio
https://cloud.r-project.org/

PyCharm
https://www.jetbrains.com/pycharm/download/

Julia
https://julialang.org/downloads/

Codeblocks
https://www.codeblocks.org/downloads/binaries/

Part I - Background

Scientific programming

What is it?

Types of people learning


Scientists? Sometimes all levels…

Need for formal training?

Depends…

Hardest Parts

  1. Finding 'homework'/Spending time coding
  2. Getting over the perception that its hard: That…
  1. Amount of content
  2. Learning how to do things "beautifully"

Part II - Computer Fundamentals

Computer Simplified

Single Thread


Thread = single stream to and from CPU

Multiple Threads


Until much much later, don't worry about multiple streams

Machine Code


Your CPU only speaks a specific dialect of
binary (machine code).
All machine codes must be very simple and exact.

Memory

Types

Memory Allocation

Registers


Efficiency

Machine Dialects


Extreme efficiency + innovation
= thousands of dialects

Problem: we want computers to do the same thing

Human Code

Computers -> simple encoding
Humans -> complex coding

Fewer characters = longer code
Binary is simple, but verbose

Machine Code summary

Solution: interface for machine code

Interface

An interface abstracts low level features
to high level abstractions:

(Complex transmission state to "1")

Interface codes

Non-machine codes interface with machine
codes through compilation or interpretation
Both compilation and interpretation* transforms
high-level code to machine specific code.

Interpretation

interpretation = compile + immediate execution

Directly
Before run-time
Interpreted
Compilations at initial language install
Hybrid (JIT)
During run-time

The Trade-off

Compiled languages

Interpreted languages

JIT

Part III - Which Language?

What kind of interface do you want?
There is no ultimate language
Different languages for different purposes
Most languages can do anything you want,
but may not be the best choice for the problems

Brief History

Fortran 1957 Designed for scientific computing
Unix* 1973 Solidified conventions for personal computers
C 1978 Solidified conventions for programming
Matlab 1984 Designed for numeric computing
Perl 1987 Designed for general use by power-users
Bash Shell* 1989
Python 1991 Designed for learning how to program
LAPACK* 1992 SciPy, Matlab
R 1993 Designed for statisticians
Julia 2012 Designed for scientific-computing

Current state

Strong majorities with:

Julia catching up
Some people use

Comparison

C Perl python Matlab R julia
*Community 2 3 3 1 2
*Data-tables x x x
*Notebook x (x) x
*Conventions 2 3 3 1-3 3
*OOP support 1 2 3 1-2 2
*Free (Freedom) (x) x x x
Free (Beer) x x x x
compiled x
interpreted x x x (x)
Linear Algebra
Tables
Package management 2 3 3 1 3
Syntax simplicity 1 1 2 3 1
Syntax efficiency 1 3 2 2 2
Builtin-Editor x
Dynamic Interpreter (x) x x x
GPU x x x
General Scripting (x) x x (x)
documentation 2 3 3 1-3 2-3
C interface 3 ? 2-3 2 3
Live Interpreter
Style

1=worst
3 = best

Matlab fails some of the important ones

Comparison part2

https://hyperpolyglot.org/numerical-analysis

Conventions for science

Very loose
Good - Freedom!
Bad - Freedom!

Subfield preference

Features

Goal

Styles

Syntax

Builtin tools

What can a language do out of the box?
What setup is required?

Libraries/Community

Library/package
A language may be technically inferior in all regards,
but may be the right choice based upon community
and available packages/libraries

Languages I use

Matlab - Psychtoolbox, sandbox
Python - Citation management
R - Regression
Stan - Advanced stats
Julia - Machine learning
C/C++ - Matlab + speed
eLisp - Text editor customization
Unix scripting (i.e. Bash, Zsh)(aka Terminal)
Perl - Text processing and database management

The Secret

No secret.

Differences are semantic
If you learn one language in and out,
it is much easier to learn another

Recommendation

  1. Get your feet wet with Octave
  2. Learn C
  3. Learn OOP

Part IV - Workspaces

Interfaces for machine code interfaces

Builtin environments

Matlab/Octave
R (Rstudio)

Editors/IDEs

Language specific

IDE features

Integrated Development Environment
Lots of features including

Interpreters

Run Script

python mypythonscript.py
perl myperlscript.pl
Rscript myRfile.r
julia myjuliascript.jl
matlab -nodisplay -nosplash -nodesktop -r "run('myscript.m'); exit;"

Run code

python -c "print(3+3)"
perl -e "print(3+3)"
Rscript -e "print(3+3)"
julia -e "print(3+3)"
matlab -nodisplay -nosplash -nodesktop -r "display(3+3); exit;"

C*
https://root.cern.ch/root/html534/guides/users-guide/CINT.html

REPLs

Read-eval-print-loop

RUN SOME RAW CODE
python
ipython
R
julia
matlab17 -nodisplay -nosplash -nodesktop

Package managers

External
pip

Internal
install.package("")
Pkg.install("")

NONE
matlab

Compilers

C

Documentation

help('cmd')
help('cmd')
?cmd
help cmd

https://docs.python.org/3/
https://www.rdocumentation.org/
https://docs.julialang.org/en/v1/
https://www.mathworks.com/help/matlab/

Part V - Programming Fundamentals

Line by line, left to right, top to bottom
Runs until it finishes or does something it can't do

Single thread = Linear = easy!

Statements

Statement = Atomic unit of code
Statement a single command/line of code

Types

Declarations - create a label, assigning a type
Executions - simplify/do
Assignments - group/label things
(Or some combination of the three)

These are sequential

Examples

include <stdio.h>
{
    double pi; // declaration
    pi = 3.15926535; // Assigment
    2*.1145926535; // execution
    printf(%f, pi);
}

{
  double pi_two = 3.1415926535*2;
  printf(%f, pi);
} // All

Octave/Matlab example

pi=3.1415926535*2;
disp(pi);
disp(class(pi));
disp(class(3));

python/matlab/R -> implicit declaration
julia -> optional declarations

Statement Elements

Variables - labeled/grouped data
Operations - interactions between variables
Routines - labeled/grouped code

Hierarchy
Operations act on variables
Routines act on variables and Operations

examples

int square (in) {
    out= in*in
    return out
}

{
    int A = 3;
    B=square(A);
}

octave/matlab


function out=square(in)
    out=in*in;
end
disp(square(2));

Element properties

Binding - its name, as an abstraction (LHS)
Assignment - its contents (RHS)
Scope - what it can see
Namespace - where others can see it
(extent of binding + assignment)

Examples

  1. example 1

    One variable, two different assignments, single namespace

    {
      int A=3;
      A=4;
    }
  2. example 2

    Two different variables with the same bindings and assignment, two namespaces

    int return_A (){
      int A = 3;
      return A;
    }
    
    {
      int A = 3;
    }
  3. example 3

    One variable, two different assignments

    A=uint8(3);
    A=uint8(4);

    Two variables…

    A=uint8(3);
    A=3;

Suitcase analogy

examples

int add_three (in) {
    A=3;
    return A+3;
}

int A = 10;
C=add_three(A);

octave/matlab

function out=add_three(in)
    A=3;
    out=in+A;
end

int A = 10;
C=add_three(A);

Note

Custom operator bindings are not typically done unless using OOP (session 3)

Part VI - Basics

You can only
Create statements
(Declare, Execute, and/or Assign )
with statement elements
(Variables, Routines, or Operations)

Variables

Declaration

Declaration/type assignment/memory allocation

I need to create a variable
Where should I put it?
I need to find it again
How much space Is it going to take up?
I don't want any of it to get overwritten

creating a prototype

  1. Primitive C types

    Assigned to stack

    boolean/logical
    integer (signed/unsigned)

    float (signed/unsigned)

    char
    void - an empty type
    pointer - points to an address to somewhere else in memory
    reference - like a constant pointer, cannot be reassigned

  2. Aggregate types

    Two parts:

    1. A variable in stack with a pointer to
    2. data stored in heap

    String
    has pointer to "string" of characters in heap
    Arrays
    has pointer to pointers

    1. example

      #include <stdio.h>
      
      int main() {
          char *myStr = "This is a string";
          printf("%s\n", myStr );
      
      }
      // * indicates a pointer, char defines the size of data being pointed to
  3. Deletion

    Aka garbage-collection (GC)

    Stack variables automatically gets deleted at end of their scope
    e.g. if variable was declared in a a routine, will be deleted at end of routine
    Heap variables do not

    1. example

      int main() {
          char *myStr = "This is a string";
          printf("%s\n", myStr );
          free(myStr)
      }

How do high level languages compare?

Abstracted away from memory

Julia has these features optionally

  1. exceptions :no export:

    TODO

General rules for variables

Operators

Mathematical

% Anything after a percent sign is a comment - it will not be evaluated
8+5                          % + is an operator
8*5                          % Multiplication
(5+3)/2                      % Order of operations apply!
a=4
b=3
b^2                          % Exponent
ans + 3                      % 'ans' is an automatically assigned variable

Logical

a=4
b=3
b^2                          % Exponent
ans + 3                      % 'ans' is an automatically assigned variable

b=3
a=-3
b > abs(a)                   % Is b greater than abs(a)?
b < abs(a)                   % Is b less than abs (a)?
b == 3                       % Is b equal to 3? Notice to equal signs instead of 1.
b ~= 3                       % Is b not equal to 3?
b <= 4                       % Is b less than or equal to 4?
b >= 4                       % Is b greater than or equal to 4?
true==1                      % True and 1 are are the same
false==0                     % False and zero are the same
~true==false                 % ~ takes the compliment of a test (flips the sign)
c = b==4                   % Assigning result of logical test

Combined logic

~(b <= 4)                    % Is b less _greater_ than or equal to 4?
b==5 | abs(a)==3             % OR
b==5 & abs(a)==3             % AND
(b==3 & abs(a)==3 ) | c      % Grouping

Routines

My terminology
Generalization of a mathematical function (Things go in, something comes out)
Routines don't need to have things go in or out, just that something happens somewhere

Types

Script/File
Blocks (C)
Loops* (Language dependent)
Functions
Package/module

  1. functions

    sqrt(5)                      % Functions have parentheses
    abs(-3)
    abs(3-7)                     % Inputs can be simplified before being evaluated in functions

    Many ways to 'round' a number

    round(7/2)                   % Normal rounding
    floor(7/2)                   % Round down
    ceil(7/2)                    % Round up
    fix(7/2)                     % Round towards zero

    arguments are inputs to a function

    mod(4,2)
    pi()
  2. Scripts

  3. Loops

    #+BEGIN_SRC octave
    disp(10)
    pause(1)
    disp(9)
    pause(1)
    disp(8)
    pause(1)
    disp(7)
    % ...
    disp(10)
    pause(1)
    r=1:10;
    for i = r
        disp(i)
        pause(1)
    end
    
    disp(1)
    

Scope

Routines create/define new local/lexical namespaces, but usually have scope that extends beyond that namespace.
Scope has an order for namespaces

Many bindings may exist
Name resolution is determining which one to apply

  1. Function scope

    x=3;
    plusOne=@(x) x + 1;
    plusOne(x);
    %two namespaces for x
    TWO=2;
    
    
    [out1,out2]=plusTwoAndThree( plusOne(x) );
    function [out1,out2] = plusTwoAndThree(x)
        out1=x+;
        plusThree(x);
    
        function plusThree(x)
            out2=x+;
        end
    end
    % four namespaces
    % the scope of out2 is in plusThree and plusTwoAndThree
    % Script or Block scope is handled differently than function scope
  2. Loop scope

    s = 0 # new local
    for i = 1:n
        t = s + i # new local `t`
        s = t # assign existing local `s`
    end
    print(s)
    print(i)
    print(t)
    # the scope of i & t is only within the loop
  3. Routine Name resolution

    Scope includes all parents
    Name resolution bottom-to top

  4. Path

    Your path consists of

    1. Current directory
    2. Directories that are listed in your path variable

    Add directories to your path under HOME-> ENVIRONMENT-> PATH
    Anything file/directory in your path can be autocompleted and loaded without an explicit structure

    WARNING: If you have multiple files of the same name, matlab will use the first one it finds without any warning

    which <filename>    % Lists which one matlab will use

Functions

  1. Subtypes

    Anonymous
    Methods/Subroutines

  2. Parts

    Arguments and types - Input
    Return value - Output
    Signature
    Some combination of what types are expected as arguments

Review

You can only