MATMEK-4270
High performance computing platform for solving PDEs by the spectral Galerkin method. Written in Python (Cython). https://github.com/spectralDNS/shenfun
Also important stuff, but less so as I will try to put all really important stuff in the lecture notes
\[ \frac{du(t)}{dt} \approx \frac{u(t+\Delta t) - u(t)}{\Delta t} \]
\[ \int_{\Omega} u'' v d\Omega = -\int_{\Omega} u' v' d\Omega + \int_{\Gamma} u'v d\Gamma \]
We will use both approaches to first consider function approximations and then the approximation of equations.
numpy
, scipy
, matplotlib
, sympy
, shenfun
, …ODE problem
\[ u'=-au,\quad u(0)=I,\ t\in (0,T] \]
where \(a>0\) is a constant and \(u(t)\) is the time-dependent solution.
Solving a differential equation by a finite difference method consists of four steps:
The time domain \([0,T]\) is represented by a mesh: a finite number of \(N_t+1\) points
\[ 0 = t_0 < t_1 < t_2 < \cdots < t_{N_t-1} < t_{N_t} = T \]
\(u^n\) is a mesh function, defined at the mesh points \(t_n\), \(n=0,\ldots,N_t\) only.
Can extend the mesh function to yield values between mesh points by linear interpolation:
\[ \begin{equation} u(t) \approx u^n + \frac{u^{n+1}-u^n}{t_{n+1}-t_n}(t - t_n) \end{equation} \]
\[ u'(t_n) = -au(t_n),\quad n=1,\ldots,N_t \]
Now it is time for the finite difference approximations of derivatives:
\[ u'(t_n) \approx \frac{u^{n+1}-u^{n}}{t_{n+1}-t_n} \]
Inserting the finite difference approximation in
\[ u'(t_n) = -au(t_n) \]
gives
\[ \begin{equation} \frac{u^{n+1}-u^{n}}{t_{n+1}-t_n} = -au^{n},\quad n=0,1,\ldots,N_t-1 \end{equation} \]
(Known as discrete equation, or discrete problem, or finite difference method/scheme)
How can we actually compute the \(u^n\) values?
In general: we have \(u^n\) and seek \(u^{n+1}\)
The Forward Euler scheme
Solve wrt \(u^{n+1}\) to get the computational formula: \[ u^{n+1} = u^n - a(t_{n+1} -t_n)u^n \]
Assume constant time spacing: \(\Delta t = t_{n+1}-t_n=\mbox{const}\) such that \(u^{n+1} = u^n (1- a \Delta t)\)
\[ \begin{align*} u^0 &= I,\\ u^1 & = I(1-a\Delta t),\\ u^2 & = I(1-a\Delta t)^2,\\ &\vdots\\ u^{N_t} &= I(1-a\Delta t)^{N_t} \end{align*} \]
Ooops - we can find the numerical solution by hand (in this simple example)! No need for a computer (yet)…
Here is another finite difference approximation to the derivative (backward difference):
\[ u'(t_n) \approx \frac{u^{n}-u^{n-1}}{t_{n}-t_{n-1}} \]
Inserting the finite difference approximation in \(u'(t_n)=-au(t_n)\) yields the Backward Euler (BE) scheme:
\[ \frac{u^{n}-u^{n-1}}{t_{n}-t_{n-1}} = -a u^n \]
Solve with respect to the unknown \(u^{n+1}\):
\[ u^{n+1} = \frac{1}{1+ a(t_{n+1}-t_n)} u^n \]
Note
We use \(u^{n+1}\) as unknown and rename \(u^n \longrightarrow u^{n+1}\) and \(u^{n-1} \longrightarrow u^{n}\)
Centered differences are better approximations than forward or backward differences.
Idea 1: let the ODE hold at \(t_{n+\scriptstyle\frac{1}{2}}\). With \(N_t+1\) points, that is \(N_t\) equations for \(n=0, 1, \ldots N_t-1\)
\[ u'(t_{n+\scriptstyle\frac{1}{2}}) = -au(t_{n+\scriptstyle\frac{1}{2}}) \]
Idea 2: approximate \(u'(t_{n+\scriptstyle\frac{1}{2}})\) by a centered difference
\[ u'(t_{n+\scriptstyle\frac{1}{2}}) \approx \frac{u^{n+1}-u^n}{t_{n+1}-t_n} \]
Problem: \(u(t_{n+\scriptstyle\frac{1}{2}})\) is not defined, only \(u^n=u(t_n)\) and \(u^{n+1}=u(t_{n+1})\)
Solution (linear interpolation):
\[ u(t_{n+\scriptstyle\frac{1}{2}}) \approx \frac{1}{2} (u^n + u^{n+1}) \]
Result:
\[ \frac{u^{n+1}-u^n}{t_{n+1}-t_n} = -a\frac{1}{2} (u^n + u^{n+1}) \]
Solve wrt to \(u^{n+1}\):
\[ u^{n+1} = \frac{1-\frac{1}{2} a(t_{n+1}-t_n)}{1 + \frac{1}{2} a(t_{n+1}-t_n)}u^n \] This is a Crank-Nicolson (CN) scheme or a midpoint or centered scheme.
The Forward Euler, Backward Euler, and Crank-Nicolson schemes can be formulated as one scheme with a varying parameter \(\theta\):
\[ \frac{u^{n+1}-u^{n}}{t_{n+1}-t_n} = -a (\theta u^{n+1} + (1-\theta) u^{n}) \]
\(u^n\) is known, solve for \(u^{n+1}\):
\[ u^{n+1} = \frac{1 - (1-\theta) a(t_{n+1}-t_n)}{1 + \theta a(t_{n+1}-t_n)} u^n \]
Very common assumption (not important, but exclusively used for simplicity hereafter): constant time step \(t_{n+1}-t_n\equiv\Delta t\)
Summary of schemes for constant time step \[ \begin{align} u^{n+1} &= (1 - a\Delta t )u^n \quad (\hbox{FE}) \\ u^{n+1} &= \frac{1}{1+ a\Delta t} u^n \quad (\hbox{BE}) \\ u^{n+1} &= \frac{1-\frac{1}{2} a\Delta t}{1 + \frac{1}{2} a\Delta t} u^n \quad (\hbox{CN})\\ u^{n+1} &= \frac{1 - (1-\theta) a\Delta t}{1 + \theta a\Delta t}u^n \quad (\theta-\hbox{rule}) \end{align} \]
Model:
\[ u'(t) = -au(t),\quad t\in (0,T], \quad u(0)=I \]
Numerical method:
\[ u^{n+1} = \frac{1 - (1-\theta) a\Delta t}{1 + \theta a\Delta t}u^n \]
for \(\theta\in [0,1]\). Note
import numpy as np
def solver(I, a, T, dt, theta):
"""Solve u'=-a*u, u(0)=I, for t in (0, T] with steps of dt."""
Nt = int(T/dt) # no of time intervals
T = Nt*dt # adjust T to fit time step dt
u = np.zeros(Nt+1) # array of u[n] values
t = np.linspace(0, T, Nt+1) # time mesh
u[0] = I # assign initial condition
for n in range(0, Nt): # n=0,1,...,Nt-1
u[n+1] = (1 - (1-theta)*a*dt)/(1 + theta*dt*a)*u[n]
return u, t
u, t = solver(I=1, a=2, T=8, dt=0.8, theta=1)
# Write out a table of t and u values:
for i in range(len(t)):
print(f't={t[i]:6.3f} u={u[i]:g}')
import numpy as np
def solver(I, a, T, dt, theta):
"""Solve u'=-a*u, u(0)=I, for t in (0, T] with steps of dt."""
Nt = int(T/dt) # no of time intervals
T = Nt*dt # adjust T to fit time step dt
u = np.zeros(Nt+1) # array of u[n] values
t = np.linspace(0, T, Nt+1) # time mesh
u[0] = I # assign initial condition
for n in range(0, Nt): # n=0,1,...,Nt-1
u[n+1] = (1 - (1-theta)*a*dt)/(1 + theta*dt*a)*u[n]
return u, t
u, t = solver(I=1, a=2, T=8, dt=0.8, theta=1)
# Write out a table of t and u values:
for i in range(len(t)):
print(f't={t[i]:6.3f} u={u[i]:g}')
t= 0.000 u=1
t= 0.800 u=0.384615
t= 1.600 u=0.147929
t= 2.400 u=0.0568958
t= 3.200 u=0.021883
t= 4.000 u=0.00841653
t= 4.800 u=0.00323713
t= 5.600 u=0.00124505
t= 6.400 u=0.000478865
t= 7.200 u=0.000184179
t= 8.000 u=7.0838e-05
We will also learn about plotting. It is very important to present data in a clear and consise manner. It is very easy to generate a naked plot
But you should always add legends, titles, exact solution, etc. Make the plot nice:-)
u_exact = lambda t, I, a: I*np.exp(-a*t)
u, t = solver(I=I, a=a, T=T, dt=0.8, theta=1)
te = np.linspace(0, T, 1000)
ue = u_exact(te, I, a)
fig = plt.figure(figsize=(6, 4))
plt.plot(t, u, 'bs-', te, ue, 'r')
plt.title('Decay')
plt.legend(['numerical', 'exact'])
plt.xlabel('Time'), plt.ylabel('u(t)');
What is exact?
There is a difference between exact numerical solution and exact solution!
Repeated use of the \(\theta\)-rule gives exact numerical solution: \[ \begin{align*} u^0 &= I,\\ u^1 &= Au^0 = AI\\ u^n &= A^nu^{n-1} = A^nI \end{align*} \]
Exact solution on the other hand:
\[ u(t) = \exp(-a t), \quad u(t_n) = \exp(-a t_n) \]
The exact discrete solution is
\[ u^n = IA^n \]
Test if your solver gives
\[ \max_n |u^n - IA^n| < \epsilon\sim 10^{-15} \]
for a few precalculated steps.
Tip
Make sure you understand what \(n\) in \(u^n\) and in \(A^n\) means! \(n\) is not used as a power in \(u^n\), but it is a power in \(A^n\)!
Use a calculator (\(I=0.1\), \(\theta=0.8\), \(\Delta t =0.8\)):
\[ A\equiv \frac{1 - (1-\theta) a\Delta t}{1 + \theta a \Delta t} = 0.298245614035 \]
\[ \begin{align*} u^1 &= AI=0.0298245614035,\\ u^2 &= Au^1= 0.00889504462912,\\ u^3 &=Au^2= 0.00265290804728 \end{align*} \]
def test_solver_three_steps(solver):
"""Compare three steps with known manual computations."""
theta = 0.8
a = 2
I = 0.1
dt = 0.8
u_by_hand = np.array([I,
0.0298245614035,
0.00889504462912,
0.00265290804728])
Nt = 3 # number of time steps
u, t = solver(I=I, a=a, T=Nt*dt, dt=dt, theta=theta)
tol = 1E-14 # tolerance for comparing floats
diff = abs(u - u_by_hand).max()
success = diff < tol
assert success, diff
test_solver_three_steps(solver)
Note
We do not use the exact solution because the numerical solution will not equal the exact!
Norms of a function \(f(t)\):
\[ \begin{align} ||f||_{L^2} &= \left( \int_0^T f(t)^2 dt\right)^{1/2} \\ ||f||_{L^1} &= \int_0^T |f(t)| dt \\ ||f||_{L^\infty} &= \max_{t\in [0,T]}|f(t)| \end{align} \]
The Trapezoidal rule:
\[ ||f^n|| = \left(\Delta t\left(\scriptstyle\frac{1}{2}(f^0)^2 + \scriptstyle\frac{1}{2}(f^{N_t})^2 + \sum_{n=1}^{N_t-1} (f^n)^2\right)\right)^{1/2} \]
Common simplification yields the \(\ell^2\) norm of a mesh function:
\[ ||f^n||_{\ell^2} = \left(\Delta t\sum_{n=0}^{N_t} (f^n)^2\right)^{1/2} \]
\[ E = ||e^n||_{\ell^2} = \sqrt{\Delta t\sum_{n=0}^{N_t} (e^n)^2} \]
Python code for the norm:
The code is naive and not very efficient. It is not vectorized!
Vectorization
Vectorization refers to the process of converting iterative operations on individual elements of an array (or other data structures) into batch operations on entire arrays.
For example, you have three arrays
\[ \boldsymbol{u} = (u_i)_{i=0}^N, \boldsymbol{v} = (v_i)_{i=0}^N, \boldsymbol{w} = (w_i)_{i=0}^N \]
Now compute
\[ w_i = u_i \cdot v_i, \quad \forall \, i=0, 1, \ldots, N \]
The code is naive and not very efficient. It is not vectorized!
Vectorization
Vectorization refers to the process of converting iterative operations on individual elements of an array (or other data structures) into batch operations on entire arrays.
Regular (scalar) implementation:
Vectorized:
Numpy is heavily vectorized! So much so that mult, add, div, etc are vectorized by default!
The code is naive and not very efficient. It is not vectorized!
Vectorization
Vectorization refers to the process of converting iterative operations on individual elements of an array (or other data structures) into batch operations on entire arrays.
Vectorization warning
Pretty much all the code you will see and get access to in this course will be vectorized!
Get rid of the for-loop!
u[0] = I # assign initial condition
for n in range(0, Nt): # n=0,1,...,Nt-1
u[n+1] = (1 - (1-theta)*a*dt)/(1 + theta*dt*a)*u[n]
How? Difficult because it is a recursive update and not regular elementwise multiplication. But remember
\[ A = (1 - (1- \theta) a \Delta t)/(1 + \theta \Delta t a) \]
\[ \begin{align*} u^1 & = A u^0,\\ u^2 & = A u^1,\\ &\vdots\\ u^{N_t} &= A u^{N_t-1} \end{align*} \]
u[0] = I # assign initial condition
for n in range(0, Nt): # n=0,1,...,Nt-1
u[n+1] = (1 - (1-theta)*a*dt)/(1 + theta*dt*a)*u[n]
Can be implemented as
def f0(u, I, theta, a, dt):
u[0] = I
u[1:] = (1 - (1-theta)*a*dt)/(1 + theta*dt*a)
u[:] = np.cumprod(u)
return u
def f1(u, I, theta, a, dt):
u[0] = I
for n in range(0, len(u)-1):
u[n+1] = (1 - (1-theta)*a*dt)/(1 + theta*dt*a)*u[n]
return u
I, a, T, dt, theta = 1, 2, 8, 0.8, 1
u, t = solver(I, a, T, dt, theta)
assert np.allclose(f0(u.copy(), I, theta, a, dt),
f1(u.copy(), I, theta, a, dt))
Lets try some timings!
Lets try some timings:
<TimeitResult : 2.32 µs ± 94 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)>
<TimeitResult : 2.26 µs ± 68.3 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)>
<TimeitResult : 2.86 µs ± 1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>
<TimeitResult : 20.6 µs ± 517 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)>
Even longer array:
Length of u = 1001
<TimeitResult : 3.86 µs ± 2.34 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>
<TimeitResult : 220 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>
Vectorized code takes the same time! Only overhead costs, not the actual computation.
Time it once
<TimeitResult : 48.1 µs ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>
%%cython -a
#cython: boundscheck=False, wraparound=False, cdivision=True
cpdef void f3(double[::1] u, int I, double theta, double a, double dt):
cdef int n
cdef int N = u.shape[0]
u[0] = I
for n in range(0, N-1):
u[n+1] = (1 - (1-theta)*a*dt)/(1 + theta*dt*a)*u[n]
return
Generated by Cython 3.0.10
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
+1: #cython: boundscheck=False, wraparound=False, cdivision=True
__pyx_t_7 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_7) < 0) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
+2: cpdef void f3(double[::1] u, int I, double theta, double a, double dt):
static PyObject *__pyx_pw_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_1f3(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ); /*proto*/ static void __pyx_f_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_f3(__Pyx_memviewslice __pyx_v_u, int __pyx_v_I, double __pyx_v_theta, double __pyx_v_a, double __pyx_v_dt, CYTHON_UNUSED int __pyx_skip_dispatch) { int __pyx_v_n; int __pyx_v_N; /* … */ /* function exit code */ __pyx_L0:; } /* Python wrapper */ static PyObject *__pyx_pw_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_1f3(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ); /*proto*/ static PyMethodDef __pyx_mdef_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_1f3 = {"f3", (PyCFunction)(void*)(__Pyx_PyCFunction_FastCallWithKeywords)__pyx_pw_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_1f3, __Pyx_METH_FASTCALL|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_1f3(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ) { __Pyx_memviewslice __pyx_v_u = { 0, 0, { 0 }, { 0 }, { 0 } }; int __pyx_v_I; double __pyx_v_theta; double __pyx_v_a; double __pyx_v_dt; #if !CYTHON_METH_FASTCALL CYTHON_UNUSED Py_ssize_t __pyx_nargs; #endif CYTHON_UNUSED PyObject *const *__pyx_kwvalues; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("f3 (wrapper)", 0); #if !CYTHON_METH_FASTCALL #if CYTHON_ASSUME_SAFE_MACROS __pyx_nargs = PyTuple_GET_SIZE(__pyx_args); #else __pyx_nargs = PyTuple_Size(__pyx_args); if (unlikely(__pyx_nargs < 0)) return NULL; #endif #endif __pyx_kwvalues = __Pyx_KwValues_FASTCALL(__pyx_args, __pyx_nargs); { PyObject **__pyx_pyargnames[] = {&__pyx_n_s_u,&__pyx_n_s_I,&__pyx_n_s_theta,&__pyx_n_s_a,&__pyx_n_s_dt,0}; PyObject* values[5] = {0,0,0,0,0}; if (__pyx_kwds) { Py_ssize_t kw_args; switch (__pyx_nargs) { case 5: values[4] = __Pyx_Arg_FASTCALL(__pyx_args, 4); CYTHON_FALLTHROUGH; case 4: values[3] = __Pyx_Arg_FASTCALL(__pyx_args, 3); CYTHON_FALLTHROUGH; case 3: values[2] = __Pyx_Arg_FASTCALL(__pyx_args, 2); CYTHON_FALLTHROUGH; case 2: values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); CYTHON_FALLTHROUGH; case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); CYTHON_FALLTHROUGH; case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = __Pyx_NumKwargs_FASTCALL(__pyx_kwds); switch (__pyx_nargs) { case 0: if (likely((values[0] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_u)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[0]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) else goto __pyx_L5_argtuple_error; CYTHON_FALLTHROUGH; case 1: if (likely((values[1] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_I)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[1]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) else { __Pyx_RaiseArgtupleInvalid("f3", 1, 5, 5, 1); __PYX_ERR(0, 2, __pyx_L3_error) } CYTHON_FALLTHROUGH; case 2: if (likely((values[2] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_theta)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[2]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) else { __Pyx_RaiseArgtupleInvalid("f3", 1, 5, 5, 2); __PYX_ERR(0, 2, __pyx_L3_error) } CYTHON_FALLTHROUGH; case 3: if (likely((values[3] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_a)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[3]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) else { __Pyx_RaiseArgtupleInvalid("f3", 1, 5, 5, 3); __PYX_ERR(0, 2, __pyx_L3_error) } CYTHON_FALLTHROUGH; case 4: if (likely((values[4] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_dt)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[4]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) else { __Pyx_RaiseArgtupleInvalid("f3", 1, 5, 5, 4); __PYX_ERR(0, 2, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { const Py_ssize_t kwd_pos_args = __pyx_nargs; if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_kwvalues, __pyx_pyargnames, 0, values + 0, kwd_pos_args, "f3") < 0)) __PYX_ERR(0, 2, __pyx_L3_error) } } else if (unlikely(__pyx_nargs != 5)) { goto __pyx_L5_argtuple_error; } else { values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); values[2] = __Pyx_Arg_FASTCALL(__pyx_args, 2); values[3] = __Pyx_Arg_FASTCALL(__pyx_args, 3); values[4] = __Pyx_Arg_FASTCALL(__pyx_args, 4); } __pyx_v_u = __Pyx_PyObject_to_MemoryviewSlice_dc_double(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_u.memview)) __PYX_ERR(0, 2, __pyx_L3_error) __pyx_v_I = __Pyx_PyInt_As_int(values[1]); if (unlikely((__pyx_v_I == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) __pyx_v_theta = __pyx_PyFloat_AsDouble(values[2]); if (unlikely((__pyx_v_theta == (double)-1) && PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) __pyx_v_a = __pyx_PyFloat_AsDouble(values[3]); if (unlikely((__pyx_v_a == (double)-1) && PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) __pyx_v_dt = __pyx_PyFloat_AsDouble(values[4]); if (unlikely((__pyx_v_dt == (double)-1) && PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) } goto __pyx_L6_skip; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("f3", 1, 5, 5, __pyx_nargs); __PYX_ERR(0, 2, __pyx_L3_error) __pyx_L6_skip:; goto __pyx_L4_argument_unpacking_done; __pyx_L3_error:; { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __PYX_XCLEAR_MEMVIEW(&__pyx_v_u, 1); __Pyx_AddTraceback("_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38.f3", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_f3(__pyx_self, __pyx_v_u, __pyx_v_I, __pyx_v_theta, __pyx_v_a, __pyx_v_dt); int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; /* function exit code */ __PYX_XCLEAR_MEMVIEW(&__pyx_v_u, 1); { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_f3(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_u, int __pyx_v_I, double __pyx_v_theta, double __pyx_v_a, double __pyx_v_dt) { PyObject *__pyx_r = NULL; __Pyx_XDECREF(__pyx_r); if (unlikely(!__pyx_v_u.memview)) { __Pyx_RaiseUnboundLocalError("u"); __PYX_ERR(0, 2, __pyx_L1_error) } __pyx_f_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_f3(__pyx_v_u, __pyx_v_I, __pyx_v_theta, __pyx_v_a, __pyx_v_dt, 0); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L1_error) __pyx_t_1 = __Pyx_void_to_None(NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38.f3", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__20 = PyTuple_Pack(5, __pyx_n_s_u, __pyx_n_s_I, __pyx_n_s_theta, __pyx_n_s_a, __pyx_n_s_dt); if (unlikely(!__pyx_tuple__20)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__20); __Pyx_GIVEREF(__pyx_tuple__20); /* … */ __pyx_t_7 = __Pyx_CyFunction_New(&__pyx_mdef_54_cython_magic_56487b4d6695ae1d6d1ee138204b704501527e38_1f3, 0, __pyx_n_s_f3, NULL, __pyx_n_s_cython_magic_56487b4d6695ae1d6d, __pyx_d, ((PyObject *)__pyx_codeobj__21)); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); if (PyDict_SetItem(__pyx_d, __pyx_n_s_f3, __pyx_t_7) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
3: cdef int n
+4: cdef int N = u.shape[0]
__pyx_v_N = (__pyx_v_u.shape[0]);
+5: u[0] = I
__pyx_t_1 = 0; *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_u.data) + __pyx_t_1)) )) = __pyx_v_I;
+6: for n in range(0, N-1):
__pyx_t_2 = (__pyx_v_N - 1); __pyx_t_3 = __pyx_t_2; for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) { __pyx_v_n = __pyx_t_4;
+7: u[n+1] = (1 - (1-theta)*a*dt)/(1 + theta*dt*a)*u[n]
__pyx_t_1 = __pyx_v_n; __pyx_t_5 = (__pyx_v_n + 1); *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_u.data) + __pyx_t_5)) )) = (((1.0 - (((1.0 - __pyx_v_theta) * __pyx_v_a) * __pyx_v_dt)) / (1.0 + ((__pyx_v_theta * __pyx_v_dt) * __pyx_v_a))) * (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_u.data) + __pyx_t_1)) )))); }
+8: return
goto __pyx_L0;
<TimeitResult : 1.28 µs ± 21.6 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)>
Cython and Numba are both fast!
Cython and Numba are both as fast as pure C. Either one can be used to speed up critical routines with very little additional effort!
Note
Cython is very easy to use in notebooks, but requires some additional steps to be compiled used as extension modules with regular python programs.