Tag Archives: Scientific ML

ModelingToolkit, Modelica, and Modia: The Composable Modeling Future in Julia

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/modelingtoolkit-modelica-and-modia-the-composable-modeling-future-in-julia/

Let me take a bit of time here to write out a complete canonical answer to ModelingToolkit and how it relates to Modia and Modelica. This question comes up a lot: why does ModelingToolkit exist instead of building on tooling for Modelica compilers? I’ll start out by saying I am a huge fan of Martin and Hilding’s work, I respect them a ton, and they have made major advances in this space. But I think ModelingToolkit tops what they have developed in a not-so-subtle way. And it all comes down to the founding principle, the foundational philosophy, of what a modeling language needs to do.

Composable Abstractions for Model Transformations

There is a major philosophical difference which is seen in both the development and usage of the tools. Everything in the SciML organization is built around a principle of confederated modular development: let other packages influence the capabilities of your own. This is highlighted in a paper about the package structure of DifferentialEquations.jl. The underlying principle is that not everyone wants or needs to be a developer of the package, but still may want to contribute. For example, it’s not uncommon that a researcher in ODE solvers wants to build a package that adds one solver to the SciML ecosystem. Doing this in their own package for their own academic credit, but with the free bonus that now it exists in the multiple dispatch world. In the design of DifferentialEquations.jl, solve(prob,IRKGL16()) now exists because of their package, and so we add it to the documentation. Some of this work is not even inside the organization, but we still support it. The philosophy is to include every researcher as a budding artist in the space of computational research, including all of the possible methods, and building an infrastructure that promotes a free research atmosphere in the methods. Top level defaults and documentation may lead people to the most stable aspects of the ecosystem, but with a flip of a switch you can be testing out the latest research.

When approaching modeling languages like Modelica, I noticed this idea was completely foreign to modeling languages. Modelica is created by a committee, but the implementations that people use are closed like Dymola, or monolithic like OpenModelica. This is not a coincidence but instead a fact of the design of the language. In the Modelica language, there is no reference to what transformations are being done to your models in order to make them “simulatable”. People know about Pantelides algorithm, and “singularity elimination”, but this is outside the language. It’s something that the compiler maybe gives you a few options for, but not something the user or the code actively interacts with. Every compiler is different, advances in one compiler do not help your model when you use another compiler, and the whole world is siloed. By this design, it is impossible for an external user to write compiler passes in Modelica which effects this model lowering process. You can tweak knobs, or write a new compiler. Or fork OpenModelica and hack on the whole compiler to just do the change you wanted.

I do not think that the symbolic transformations that are performed by Modelica are the complete set that everyone will need for all models for all time. I think in many cases you might want to write your own. For example, on SDEs there’s a Lamperti transformation which can exist which transforms general SDEs to SDEs with additive noise. It doesn’t always apply, but when it does it can greatly enhance solver speed and stability. This is niche enough that it’ll never be in a commercial Modelica compiler (in fact, they don’t even have SDEs), but it’s something that some user might want to be able to add to the process.

ModelingToolkit: Opening the Development Process

So the starting goal of ModelingToolkit is to give an open and modular transformation system on which a whole modeling ecosystem can thrive. My previous blog post exemplified how unfamiliar use cases for code transformations can solve many difficult mathematical problems, and my goal is to give this power to the whole development community. `structural_simplify` is something built into ModelingToolkit to do “the standard transformations” on the standard systems, but the world of transformations is so much larger. Log-transforming a few variables? Exponentiating a few to ensure positivity? Lamperti transforms of SDEs? Transforming to the sensitivity equations? And not just transformations, but functionality for inspecting and analyzing models. Are the equations linear? Which parameters are structurally identifiable?

From that you can see that Modia was a major inspiration for ModelingToolkit, but Modia did not go in this direction of decomposing the modeling language: it essentially is a simplified Modelica compiler in Julia. But ModelingToolkit is a deconstruction of what a modeling language is. It pulls it down to its component pieces and then makes it easy to build new modeling languages like Catalyst.jl which internally use ModelingToolkit for all of the difficult transformations. The deconstructed form is a jumping point for building new domain-based languages, along with new transformations which optimize the compiler for specific models. And then in the end, everybody who builds off of it gets improved stability, performance, and parallelism as the core MTK passes improve.

Bringing the Power to the People

Now there’s two major aspects that need to be handle to fully achieve such a vision though. If you want people to be able to reuse code between transformations, what you want is to expose how you are changing code. To achieve this goal, a new Computer Algebra System (CAS), Symbolics.jl, was created for ModelingToolkit.jl. The idea being, if we want everyone writing code transformations, they should all have easy access to a general mathematical toolset for doing such code transformations. We shouldn’t have everyone building a new code for differentiation, simplify, and substitution. And we shouldn’t have everyone relying on undocumented internals of ModelingToolkit.jl either: this should be something that is open, well-tested, documented, and a well-known system so that everyone can easily become a “ModelingToolkit compiler developer”. By building a CAS and making it a Julia standard, we can bridge that developer gap because now everyone knows how to easily manipulate models: they are just Symbolics.jl expressions.

The second major aspect is to achieve a natural embedding into the host language. Modelica is not a language in which people can write compiler passes, which introduces a major gap between the modeler and the developer of extensions to the modeling language. If we want to bridge this gap, we need to ensure the whole modeling language is used from a host which is a complete imperative programming language. And you need to do so in a language that is interactive, high performance, and has a well-developed ecosystem for modeling and simulation. Martin and Hilding had seen this fact as the synthesis for Modia with how Julia uniquely satisfies this need, but I think we need to take it a step further. To really make the embedding natural, you should be able to on the fly automatically convert code to and from the symbolic form. In the previous blog post I showcased how ModelingToolkit.jl could improve people’s code by automatically parallelizing it and performing index reduction even if the code was not written in ModelingToolkit.jl. This grows the developer audience of the transformation language from “anyone who wants to transform models” to “anyone who wants to automate improving models and general code”. This expansion of the audience is thus pulling in developers who are interested in things like automating parallelism and GPU codegen and bringing them into the MTK developer community.

Intern, since all of these advances then apply to the MTK internals and code generation tools such as Symbolics.jl’s build_function, new features are coming all of the time because of how the community is composed. The CTarget build_function was first created to transpile Julia code to C, and thus ModelingToolkit models can generate C outputs for compiling into embedded systems. This is serendipity when seeing one example, but it’s design when you notice that this is how the entire system is growing so fast.

But Can Distributed Development Be As Good As Specialized Code?

Now one of the questions we received early on was, won’t you not be able to match the performance a specialized compiler which was only made to work on Modelica, right? While at face value it may seem like hyperspecialization could be beneficial, the true effect of hyperspecialization is that algorithms are simply less efficient because less work has been put into them. Symbolics.jl has become a phenomenon of its own, with multiple different hundred comment threads digging through many aspects of the pros and cons of its designs, and that’s not even including the 200 person chat channel which has had tens of thousands of messages in the less than 2 months since the CAS was released. Tons of people are advising how to improve every single plus and multiply operation.

So it shouldn’t be a surprise that there are many details that have quickly been added which are still years away from a Modelica implementation. It automatically multithreads tree traversals and rewrite rules. It automatically generates fast parallelized code, and can do so in a way that composes with tearing of nonlinear equations. It lets users define their own high-performance and parallelized functions, register them, and stick them into the right hand side. And that is even excluding the higher level results, like the fact that it is fully differentiable and thus allows training neural networks decomposed within the models, and automatically discover equations from data.

Just at the very basic level we can see that the CAS is transforming the workflows of scientists and engineers in many aspects of the modeling process. By distributing the work of improving symbolic computing, we have already taken examples which were essentially not obtainable and making them instant with Symbolics.jl:

We are building out a full benchmarking system for the symbolic ecosystem to track performance over time and ensure it reaches the top level. It’s integrating pieces from The OSCAR project, getting lots of people tracking performance in their own work, and building a community. Each step is another major improvement and this ecosystem is making these steps fast. It will be hard for a few people working on the internals of a single Modelica compiler to keep up with such an environment, let alone repeating this work to every new Modelica-based project.

But How Do You Connect To Modelica?

This is a rather good question because there are a lot of models already written in Modelica, and it would be a shame for us to not be able to connect with that ecosystem. I will hint that there is coming tooling as part of JuliaSim for connecting to many pre-existing model libraries. In addition, we hope to make use of tooling like Modia.jl and TinyModia.jl will help us make a bridge.

Conclusion: Designing Around the Developer Community Has Many Benefits

The composability and distributed development nature of ModelingToolkit.jl is its catalyst. This is why ModelingToolkit.jl looks like it has rocket shoes on: it is fast and it is moving fast. And it’s because of the thought put into the design. It’s because ModelingToolkit.jl is including the entire research community as its asset instead of just its user. I plan to keep moving forward from here, looking back to learn from the greats, but building it in our own image. We’re taking the idea of a modeling language, distributing it throughout one of the most active developer communities in modeling and simulation, in a language which is made to build fast and parallelized code. And you’re invited.

PS: what about Simulink?

I’m just going to post a self-explanatory recent talk by Jonathan at the NASA Launch Services Program who saw a 15,000x acceleration by moving from Simulink to ModelingToolkit.jl.

Enough said.

The post ModelingToolkit, Modelica, and Modia: The Composable Modeling Future in Julia appeared first on Stochastic Lifestyle.

Generalizing Automatic Differentiation to Automatic Sparsity, Uncertainty, Stability, and Parallelism

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/generalizing-automatic-differentiation-to-automatic-sparsity-uncertainty-stability-and-parallelism/

Automatic differentiation is a “compiler trick” whereby a code that calculates f(x) is transformed into a code that calculates f'(x). This trick and its two forms, forward and reverse mode automatic differentiation, have become the pervasive backbone behind all of the machine learning libraries. If you ask what PyTorch or Flux.jl is doing that’s special, the answer is really that it’s doing automatic differentiation over some functions.

What I want to dig into in this blog post is a simple question: what is the trick behind automatic differentiation, why is it always differentiation, and are there other mathematical problems we can be focusing this trick towards? While very technical discussions on this can be found in our recent paper titled “ModelingToolkit: A Composable Graph Transformation System For Equation-Based Modeling” and descriptions of methods like intrusive uncertainty quantification, I want to give a high-level overview that really describes some of the intuition behind the technical thoughts. Let’s dive in!

What is the trick behind automatic differentiation? Abstract interpretation

To understand automatic differentiation in practice, you need to understand that it’s at its core a code transformation process. While mathematically it comes down to being about Jacobian-vector products and Jacobian-transpose-vector products for forward and reverse mode respectively, I think sometimes that mathematical treatment glosses over the practical point that it’s really about code.

Take for example . If we want to take the derivative of this, then we could do , but this misses the information that we actually know analytically how to define the derivative! Using the principle that algorithm efficiency comes from problem information, we can improve this process by directly embedding that analytical solution into our process. So we come to the first principle of automatic differentiation:

If you know the analytical solution to the derivative, then replace the function with its derivative

So if you see and someone calls “derivative(f,x)“, you can do a quick little lookup to a table of rules, known as primitives, and if it’s in your table then boom you’re done. Swap it in, call it a day.

This already shows you that, with automatic differentiation, we cannot think of as just a function, just a thing that takes in values, but we have to know something about what it means semantically. We have to look at it and identify “this is sin” in order to know “replace it with cos”. This is the fundamental limitation of automatic differentiation: it has to know something about your code, more information than it takes to call or run your code. This is why many automatic differentiation libraries are tied to specific implementations of underlying numerical primitives. PyTorch understands “torch.sin“ as , but it does not understand “tf.sin“ as , which is why if you place a TensorFlow function into a PyTorch training loop you will get an error thrown about the derivative calculation. This semantic mapping is the reason for libraries like ChainRules.jl which define semantic mappings for the Julia Base library and allows extensions: by directly knowing this mapping on all of standard Julia Base, you can cover the language and achieve “differentiable programming”, i.e. all programs automatically can get derivatives.

But we’re not done. Let’s say we have . The answer is not to add this new function to the table by deriving it by hand: instead we have to come up with a way to make a function generate a derivative code whenever and are in our lookup table. The answer comes from the chain rule. I’m going to describe the forward application of the chain rule as it’s a bit simpler to derive, but a full derivation of how this is done in the reverse form is described in these lecture notes. The chain rule tells us that . Thus in order to calculate , we need to know two things: and . If we calculate both of these quantities at every stage of our code, it doesn’t matter how deep the composition goes, we will have all of the information that is required to reconstruct the result of the chain rule.

What this means is that automatic differentiation on this function can be thought of as the following translation process:

  1. Transform to and evaluate at
  2. Transform to and evaluate at
  3. Transform to . Now the second portion is the solution to the derivative

This translation process is “transform every primitive function into a tuple of (function,derivative), and transform every other function into a chain rule application using the two pieces” is abstract interpretation. This is the process where an interpreter of a code or language runs under different semantics. An interpreter written to do this process acts on the same code but interprets it differently: it changes each operation to a tuple of the solution and its derivative, instead of just the solution .

Thus the abstract interpretation version of the problem of calculating derivatives is to reimagine the problem as “at this step of the code, how should I be transforming it so that I have the information to calculate derivatives”? There are many ways to do this abstract interpretation process: operator overloading, prior static analysis to generate a new source code, etc. But there’s one question we should bring up.

Why do we see it always on differentiation? Why is there no automatic integration?

One way to start digging into this question is to answer a related question people pose to me often: if we have automatic differentiation, why do we not have automatic integration? While at face value it seems like the two should be analogues, digging deeper exposes what’s special about differentiation. If we wanted to do the integral of , then yes we can replace this with . The heart of the question is to ask about the chain rule: what’s the derivative of ? It turns out that there is no general rule for the “anti-chain rule”. A commonly known result is that the standard Gaussian probability distribution, , does not have an analytical solution to its antiderivative, and that’s just the case of and . While that is true, I don’t think that captures all that is different about integrals.

When I said “we can replace this with ” I was actually wrong: the antiderivative of is not , it’s . There is no unique solution without imposing some external context or some global information like “and F(x)=0”. Differentiation is special because it’s purely local: only knowing the value of I can know the derivative of . Integration is a well-known example of a non-local operation in mathematics: in order to know the anti-derivative at a value of , you might need to know information about some value , and sometimes it’s not necessarily obvious what that value should even be. This nonlocality manifests in other ways as well: while is not integrable, is easy to solve via a u-substitution, making and cancelling out the in-front of the . So there is no chain rule not because some things don’t have an antiderivative, but because you have nonlocality, so can be non-integrable while is. There is no chain rule because you can’t look at small pieces and transform them, instead you have to look at the problem holistically.

But this gives us a framework in order to judge whether a mathematical problem is amenable to being solved in the framework of abstract interpretation: it must be local so that we can define a step-by-step transformation algorithm, or we need to include/impose some form of context if we have alternative information.

Let’s look at a few related problems that can be solved with this trick.

Automatic Sparsity Detection in Jacobians

Recall that the Jacobian is the matrix of partial derivatives, i.e. for where and are vectors, it’s the matrix of terms . This matrix shows up in tons of mathematical algorithms, and in many cases it’s sparse, so it’s common problem to try and compute the sparsity pattern of a Jacobian. But what does this sparsity pattern mean? If you write out the analytical solution to , a zero in the Jacobian means that is not a function of . In other words, has no influence on . For an arbitrary program , can we use abstract interpretation to calculate whether influences ?

It turns out that if we make this question a little simpler then it has a simple solution. Let’s instead ask, can we use abstract interpretation to calculate whether can influence ? The reason for this change is because the previous question was non-local: is programmatically dependent on , but mathematically so you could cancel it out if you have good global knowledge of the program. So “does it influence” this output is hard, but “can it influence” the output is easy. “Can influence” is just the question of “does show up in the calculation at all?”

So we can come up with an abstract interpretation formulation to solve this problem. Instead of computing values, we can compute “influencer sets”. The output of is influenced by . The output of is influenced by . For , the influencer set of is the same as the influencer set of . So our abstract interpretation is to replace variables by influencer sets, and whenever the collide by a binary function like multiplication, we make the new influencer set be the union of the two. Otherwise we keep propagating it forward. The result of this way of running the program is that output that say “these are all of the variables which can be influencing this output variable”. If never shows up at any stage of the computation of , then there is no way it could ever influence it, and therefore . So the sparsity pattern is bounded by the influencer set.

This is the process behind the the automated sparsity tooling of the Julia programming language, which are now embedded as part of Symbolics.jl. There is a bit more you have to do if you see branching: i.e. you have to take all branches and take the union of the influencer sets (so it’s clear this cannot be generally solved with just operator overloading because you need to take non-local control of control flow). Details on the full process are described in Sparsity Programming: Automated Sparsity-Aware Optimizations in Differentiable Programming, along with extensions to things like Hessians which is all about tracking sets of linear dependence.

Automatic Uncertainty Quantification

Let’s say we didn’t actually know the value to put into our program, and instead it was some . What could we do then? In a standard physics class you probably learned a few rules for uncertainty propagation: and so on. It’s good to understand where this all comes from. If you said that was a random variable from a normal distribution with mean and standard deviation , and was an independent random variable from a normal distribution with mean and standard deviation , then would have mean and standard deviation . This means there are some local rules for propagating normal distributed random variables! What do you do for for a normally distributed input? You could approximate with its linear approximation: (this is another way to state that the derivative is the tangent vector at ). At a given value of then we just have a linear equation for some scalar , in which case we use the rule . This gives an abstract interpretation implementation to approximately computing with normal distributions! Now all we have to do is replace function calls with automatic differentiation around the mean and then propagate forward our error bounds.

This is a very crude description which you can expand to linear error propagation theory where you can more accurately treat the nonlinear propagation of variance. However, this is still missing out on whether two variables are dependent: , there’s no uncertainty there, so you need to treat that in a special way! If you think about it, “dependence propagation” is very similar to “propagating influencer sets”, which you can use to even more accurately propagate the variance terms. This gives rise to the package Measurements.jl which transforms code to make it additionally do propagation of normally distributed uncertainties.

I note in passing that Interval Arithmetic is very similarly formulated as an alternative interpretation of a program. David Sanders has a great tutorial on what this is all about and how to make use of it, so check that out for more information.

Using Context Information: Automatic Index Reduction in DAEs and Parallelism

Now let’s look at solving something a little bit deeper: simulating a pendulum. I know you’ll say “but I solved pendulum equations by hand in physics” but sorry to break it to you: you didn’t. In an early physics class you will say “all pendulums are one dimensional”, and “the angle is small, so sin(x) is approximately x” and arrive a beautiful linear ODE that you analytically solve. But the world isn’t that beautiful. So let’s look at the full pendulum. Instead you have the location of the swinger and its velocity . But you also have non-constant tension , and if we have a rigid rod we know that the distance of the swinger to the origin is constant. So the evolution of the system is in full given by:





There are differential equation solvers that can handle constraint equations, these are methods for solving differential-algebraic equations (DAEs), But if you take this code and give it to pretty much any differential equation solver it will fail. It’s not because the differential equation solver is bad, but because of the properties of this equation. See, the derivative of the last equation with respect to is zero, so you end up getting a singularity in the Newton solve that makes the stepping unstable. This singularity of the algebraic equation with respect to the algebraic variables (i.e. the ones not defined by derivative terms) is known as “higher index”. DAE solvers generally only work on index-1 DAEs, and this is an index-3 DAE. What does index-3 mean? It means if you take the last equation, differentiate it twice, and then do a substitution, you get:





This is mathematically the same system of equations, but this formulation of the equations doesn’t have the index issue, and so if you give this to a numerical solver it will work.

It turns out that you can reimagine this algorithm to be something that is also solved by a form of abstract interpretation. This is one of the nice unique features of ModelingToolkit.jl, spelled out in its recent paper. While algorithms have been written before for symbolic equation-based modeling systems, it turns out you can use an abstract interpretation process to extract the formulation of the equations, solve a graph algorithm to determine how many times you should differentiate which equations, do the differentiation using rules and structures from automatic differentiation libraries, and then regenerate the code for “the better version”. As shown in a ModelingToolkit.jl tutorial on this feature, if you do this on the pendulum equations, you can change a code that is too unstable to solve into an equation that is easy enough to solve by the simplest solvers.

And then you can go even further. As I described in the JuliaCon 2020 talk on automated code optimization, now that one is regenerating the code, you can step in and construct a graph of dependencies and automatically compute independent portions simultaneously in the generated code. Thus with no cost to the user, an abstract interpretation into symbolic graph building can be used to reconstruct and automatically parallelize code. The ModelingToolkit.jl paper takes this even further by showing how a code which is not amenable to parallelism can, after context-specific equation changes like the DAE index reduction, be transformed into an alternative variant that is suddenly embarrassingly parallel.

All of these features require a bit of extra context as to “what equations you’re solving” and information like “do you care about maintaining the same exact floating point result”, but by adding in the right amount of context, we can extend mathematical abstract interpretation to solve these alternative problems in new domains.

By the way, if this excites you and you want to see more updates like this, please star ModelingToolkit.jl.

Final Trick: Constructing a PDE Solver Out Of An ODE Solver with Abstract Interpretation

Let me end by going a little bit more mathematical. You can transform code about scalars into code about functions by using the vector-space interpretation of a function. In mathematical terms, functions are vectors in Banach spaces. Or, in the space of functions, functions are points. if you have a function and a function , then is a function too, and so is . You can do computation on these “points” by working out their expansions. For example, you can write and in their Fourier series expansions: . Approximating with finitely many expansion terms, you can represent via [a[1:n];b[1:n]], and same for . The representation of can be worked out from the finite truncation (just add the coefficients), and so can . So you can transform your computation about “functions” to “arrays of coefficients representing functions”, and derive the results for what , , etc. do on these values. This is an abstract interpretation of a program that transforms it into an equivalent program about function and measures as inputs.

Now it turns out you can formally use this to do cool things. A partial differential equation (PDE) is an ODE where where instead of your values being scalars at each time , your values are now functions at each time . So what if you represent those “functions” as “scalars” via their representation in the Sobolev space, and then put those “scalars” into the ODE solver? You automatically transform your ODE solver code into a PDE solver code. Formally, this is using a branch of PDE theory known as semigroup theory and making it a computable object.

In turns out this is something you can do. ApproxFun.jl defines types “Fun“ which represent the functions as scalars in a Banach space, and defines a bunch of operations that are allowed for such functions. I showed in a previous talk that you can slap these into DifferentialEquations.jl to have it reinterpret into this function-based differential equation solver, and then start to solve PDEs via this representation.

Conclusion: Abstract Interpretation is Powerful for Mathematics

Automatic differentiation gets all of the attention, but its really this idea of abstract interpretation that we should be looking at. “How do I change the semantics of this program to solve another problem?” is a very powerful approach. In computer science it’s often used for debugging: recompile this program into the debugging version. And in machine learning it’s often used to recompile programs into derivative calculators. But uncertainty quantification, fast sparsity detection, automatic stabilization and parallelization of differential-algebraic equations, and automatic generation of PDE solvers all arise from the same little trick. That can’t be all there is out there: the formalism and theory of abstract interpretation seems like it could be a much richer field.

Bibliography

  1. ModelingToolkit: A Composable Graph Transformation System For Equation-Based Modeling
  2. Sparsity Programming: Automated Sparsity-Aware Optimizations in Differentiable Programming
  3. Uncertainty propagation with functionally correlated quantities

The post Generalizing Automatic Differentiation to Automatic Sparsity, Uncertainty, Stability, and Parallelism appeared first on Stochastic Lifestyle.

COVID-19 Epidemic Mitigation via Scientific Machine Learning (SciML)

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/covid-19-epidemic-mitigation-via-scientific-machine-learning-sciml/

Chris Rackauckas
Applied Mathematics Instructor, MIT
Senior Research Analyst, University of Maryland, Baltimore School of Pharmacy

This was a seminar talk given to the COVID modeling journal club on scientific machine learning for epidemic modeling.

Resources:

https://sciml.ai/
https://diffeqflux.sciml.ai/dev/
https://datadriven.sciml.ai/dev/
https://docs.sciml.ai/latest/
https://safeblues.org/

The post COVID-19 Epidemic Mitigation via Scientific Machine Learning (SciML) appeared first on Stochastic Lifestyle.