Author Archives: Julia Developers

Announcing support for complex-domain linear programs in Convex.jl

By: Julia Developers

Re-posted from: http://feedproxy.google.com/~r/JuliaLang/~3/MY_pv6X_SOQ/announcing-support-for-complex-domain-linear-programs-in-Convex.jl

I am pleased to announce the support for complex-domain linear programs (LPs) in Convex.jl. As one of the Google Summer of Code students under The Julia Language, I had proposed to implement the support for complex semidefinite programming. In the first phase of project, I started by tackling the problem of complex-domain LPs where in first subphase, I had announced the support for complex coefficients during JuliaCon’16 and now I take this opportunity to announce the support for complex variables in LPs.

Complex-domain LPs consist of a real linear objective function, real linear inequality constraints, and real and complex linear equality constraints.

In order to enable complex-domain LPs, we came up with these ideas:

  1. We redefined the conic_form! of every affine atom to accept complex arguments.
  2. Every complex variable z was internally represented as z = z1 + i*z2, where z1 and z2 are real.
  3. We introduced two new affine atoms real and imag which return the real and the imaginary parts of the complex variable respectively.
  4. transpose and ctranspose perform differently on complex variables so a new atom CTransposeAtom was created.
  5. A complex-equality constraint RHS = LHS can be decomposed into two corresponding real equalities constraint real(RHS) = real(LHS) and imag(RHS) = imag(LHS)

After above changes were made to the codebase, we wrote two use cases to demonstrate the usability and the correctness of our idea which I am presenting below:

# Importing Packages
Pkg.clone("https://github.com/Ayush-iitkgp/Convex.jl/tree/gsoc2")
using Convex
 
# Complex LP with real variable
n = 10 # variable dimension (parameter)
m = 5 # number of constraints (parameter)
xo = rand(n)
A = randn(m,n) + im*randn(m,n)
b = A * xo 
# Declare a real variable
x = Variable(n)
p1 = minimize(sum(x), A*x == b, x>=0) 
# Notice A*x==b is complex equality constraint 
solve!(p1)
x1 = x.value

# Let's now solve by decomposing complex equality constraint into the corresponding real and imaginary part.
p2 = minimize(sum(x), real(A)*x == real(b), imag(A)*x==imag(b), x>=0)
solve!(p2)
x2 = x.value
x1==x2 # should return true


# Let's now consider an example using a complex variable
# Complex LP with complex variable
n = 10 # variable dimension (parameter)
m = 5 # number of constraints (parameter)
xo = rand(n)+im*rand(n)
A = randn(m,n) + im*randn(m,n)
b = A * xo

# Declare a complex variable
x = ComplexVariable(n)
p1 = minimize(real(sum(x)), A*x == b, real(x)>=0, imag(x)>=0)
solve!(p1)
x1 = x.value

xr = Variable(n)
xi = Variable(n)
p2 = minimize(sum(xr), real(A)*xr-imag(A)*xi == real(b), imag(A)*xr+real(A)*xi == imag(b), xr>=0, xi>=0)
solve!(p2)
x1== xr.value + im*xi.value # should return true

List of all the affine atoms are as follows:

  1. addition, substraction, multiplication, division
  2. indexing and slicing
  3. k-th diagonal of a matrix
  4. construct diagonal matrix
  5. transpose and ctranspose
  6. stacking
  7. sum
  8. trace
  9. conv
  10. real and imag

Now, I am working towards implementing complex-domain second order conic programming. Meanwhile, I invite the Julia community to play around with the complex-domain LPs. The link to the development branch is here.

Looking forward to your suggestions!

Special thanks to my mentors Madeleine Udell and Dvijotham Krishnamurthy!

An invitation to JuliaCon 2016

By: Julia Developers

Re-posted from: http://feedproxy.google.com/~r/JuliaLang/~3/Nm293iA2_vU/juliacon-invitation

For the third year in row we are happy to invite you to JuliaCon,
the annual meeting of the Julia programming language community.
JuliaCon 2016 will be held at the Massachusetts Institute of Technology from
June 21st to 25th and as a first, this year we will have several high-profile
keynote speakers, as well as the top-notch tutorials and talks you have come to
expect over the years.
Please purchase your tickets before May 13th to take advantage of the
early-bird pricing and we look forward to seeing you in June!


JuliaCon, just like the Julia language, has come a long way over
the last three years.
In 2014 we were roughly 75 attendees meeting in a medium-sized conference room
at the University of Chicago to great success, in 2015 we had about 225
attendees and enough content to cover four full days at the Massachusetts
Institute of Technology, and this year we hope that you will join us for the
greatest JuliaCon yet!

From June 21st to the 25th JuliaCon 2016 will be held at the Massachusetts
Institute of Technology, for a full five days of Julia-related content.
On Tuesday 21st we will hold several workshops, on topics ranging
from intermediate Julia programming to more advanced topics such as writing
high-performance and parallel programming.
From Wednesday 22nd to Friday 24th we will start each day with a keynote by a
high-profile speaker, followed by talks on a great variety of subjects:
macro economics, machine learning, astrophysics, visualisation, and more!
On Saturday 25th, the final day of the conference, we will hold a hackathon
where attendees are encouraged to team up based on personal interests to either
create new Julia projects or contribute to existing ones. All these details are
now in the JuliaCon poster.

Without further ado, please allow us to introduce our keynote speakers:

  • Timothy E. Holy is an Associate Professor of Neuroscience at Washington
    University in St. Louis.
    In 2009 he received the NIH Director’s Pioneer award for innovations in
    optics and microscopy.
    His lab, which studies how the brain detects pheromones and develops new
    optical methods for imaging neuronal activity, was one of the first to adopt
    Julia for scientific research.
    He is a long time Julia contributor and a lead developer of Julia’s
    multidimensional array capabilities as well as the author of far too many
    Julia packages.
  • Thomas J. Sargent is a Professor of Economics at New York University and
    Senior Fellow at the Hoover Institution.
    In 2011 the Royal Swedish Academy of Sciences awarded him the Nobel Memorial
    Prize in Economic Sciences for his work on macroeconomics.
    Together with John Stachurski, he founded quant-econ.net, a Julia and Python
    based learning platform for quantitative economics focusing on algorithms
    and numerical methods for studying economic problems as well as coding
    skills.
  • Guy L. Steele, Jr. is a Software Architect for Oracle Labs and Principal
    Investigator of the Programming Language Research project.
    In 1994, he was made a fellow of the Association for Computing Machinery
    after receiving the Grace Murray Hopper Award in 1988.
    He is an experienced designer of programming languages, like Scheme,
    Fortress and Java, and many of his ideas have had an impact on the design
    of Julia.

We hope that our invitation entices you to join us – new, intermediate, and
experienced Julia users – for five days of fun at MIT this June and remember to
purchase your tickets before May 13th to receive a 33% early-bird
discount!


We need your help to spread this message far and wide! Post the
JuliaCon poster and
this blog post to your local email lists. Print the poster and post it
on your local message board. In addition, please tweet, retweet, post
on FaceBook and LinkedIn and other social media. This is the biggest
JuliaCon ever, and we need your help in making it a huge success.

BioJulia Project in 2016

By: Julia Developers

Re-posted from: http://feedproxy.google.com/~r/JuliaLang/~3/uCHtOhVfiW0/biojulia2016

I am pleased to announce that the next phase of BioJulia is starting! In the next several months, I’m going to implement many crucial features for bioinformatics that will motivate you to use Julia and BioJulia libraries in your work. But before going to the details of the project, let me briefly introduce you what the BioJulia project is. This project is supported by the Moore Foundation and the Julia project.

The BioJulia project is a collaborative open source project to create an infrastructure for bioinformatics in the Julia programming language. It aims to provide fast and accessible software libraries. Julia’s Just-In-Time (JIT) compiler enables this greedy goal without resorting to other compiled languages like C/C++. The central package developed under the project is Bio.jl, which provides fundamental features including biological symbols/sequences, file format parsers, alignment algorithms, wrappers for external softwares, etc. It also supports several common file formats such as FASTA, FASTQ, BED, PDB, and so on. Last year I made the FMIndexes.jl package to build a full-text search index for large genomes as a Julia Summer of Code (JSoC) student, and we released the first development version of Bio.jl. While the BioJulia project is getting more active and the number of contributors are growing, we still lack some important features for realistic applications. Filling in gaps between our current libraries and actual use cases is the purpose of my new project.

So, what will be added in it? Here is the summary of my plan:

  • Sequence analysis:
    • Online sequence search algorithms
    • Data structure for reference genomes
    • Error-correcting algorithms for DNA barcodes
    • Parsers for BAM and CRAM file formats
  • Integration with data viewers and databases:
    • Genome browser backend
    • Parsers for GFF3 and VCF/BCF
    • Database access through web APIs

These things are of crucial importance for writing analysis programs because they connects software components (e.g. programs, archives, databases, viewers, etc.); data analysis softwares in bioinformatics usually read/write formatted data from/to each other. The figure below shows common workflow of detecting genetic variants; underlined deliverables will connect softwares, archives and databases so that you can write your analysis software in the Julia language.

schema

Sequence Analysis

The online sequence search algorithms will come with three flavors: exact, approximate, and regular expression search algorithms. The exact sequence search literally means finding exactly matching positions of a query sequence in another sequence. The approximate search is similar to the exact search but allows up to a specified number of errors: mismatches, insertions, and deletions. The regular expression search accepts a query in regular expression, which enables flexible description of a query pattern like motifs. For these algorithms, there are already half-done pull requests I’m working on: #152, #153, #143.

After the last release of Bio.jl v0.1.0, the sequence data structure has been significantly rewritten to make biological sequence types coherent and extensible. But because we chose an encoding that requires 4 bits per base to represent DNA sequences, the DNA sequence type consumes too much memory than necessary to store a reference genome, which is usually composed of four kinds of DNA nucleotides (denoted by A/C/G/T) and (consecutive and relatively small number of) undetermined nucleotides (denoted by N). After trying some data structures, I found that memory space of N positions can be dramatically saved using IndexableBitVectors.jl, which is a package I created in JSoC 2015. I’m developing a separated package for reference genomes, ReferenceSequences.jl, and going to improve the functionality and performance to handle huge genomes like the human genome.

If you are a researcher or an engineer who handles high-throughput sequencing data, BAM and CRAM parsers would be the most longing feature addition in the list. BAM is the de facto standard file format to accommodate aligned sequences and most sequence mappers generate alignments in this format. CRAM is a storage-efficient alternative of BAM and is getting popular reflecting explosion of accumulated sequence data. Since these files contain massive amounts of DNA sequences from high-throughput sequencing machines, high-speed parsing is a practically desirable feature. I’m going to concentrate on the speed by careful tuning and multi-thread parallel computation which is planned to be introduced in the next Julia release.

Integration with Data Viewers and Databases

Genome browsers enable to interactively visualize genetic features found in
individuals and/or populations. For example, using the UCSC Genome Browser, you can investigate genetic regions along with sequence annotations around the ABO gene in a window. Genome browser is one of the most common visualizations and hence lots of softwares have been developed but unfortunately there is no standardized interface. So, we will need to select a promising one that is an open source and supporting interactions with other softwares. The first candidate is JBrowse, which is built with modern JavaScript and HTML5 technologies. It also supports RESTful APIs and hence it can fetch data from a backend server via HTTP. I’m planning to make an API server that responds to queries from a genome browser to interactively visualize in-memory data.

Many databases distribute their data in some standardized file formats. As for genetic annotations and variants, GFF3 and VCF would be the most common formats. If you are using data from human or mouse, you should know various annotations are available from the GENCODE project. It offers data in GTF or GFF3 file formats. NCBI provides human variation sets in VCF file formats here. These file formats are text, so you may think it is trivial to write parsers when you need them. It is partially true — if you don’t care about completeness and performance. Parsing a text file format in a naive way (for example, split a line by a tab character) allocates many temporary objects and often leads to degrade performance, while careful tuning of a parser leads to complicated code that is hard to maintain. @dcjones challenged this problem and made a great work and made Julia support for Ragel, which generates Julia code that executes a finite state machine. Daniel’s talk of the JuliaCon 2015 is helpful to know about the details if you are interested:

Sometimes you may need only a part of data provided by a database. In such a case, web-based APIs are handy to fetch necessary data on demand. BioMart Central Portal offers a unified access point to a range of biological databases that is programmatically accessible via REST and SOAP APIs. Julian wrapper to BioMart will make it much easier to access data by automatically converting response to Julia objects. In the R language, the biomaRt package is one of the most downloaded packages in Bioconductor packages: https://www.bioconductor.org/packages/stats/.

Try BioJulia!

We need users and collaborators of our libraries. Feedbacks from users in the real world are the most precious thing to improve the quality of our libraries. We welcome feature requests and discussions that will make bioinformatics easier and faster. Tools for phylogenetics and structural biology, which I didn’t mention in this post, are also under active development. You can post issues here: https://github.com/BioJulia/Bio.jl/issues; if you want to get in touch with us more casually, this Gitter room may be more convenient: https://gitter.im/BioJulia/Bio.jl.