Solving Chain Multiplication

This summer I've been working on optimizing compilation for a linear algebra DSL. This is an extension of Jeremy Siek's work on Built-to-Order BLAS functions. Often times it's more efficient to have a specialized function which fuses two or more BLAS functions. The idea behind BTO is that we'd like to specify these functions at a high level (i.e., with liner algebra expressions) and then automatically perform the optimizing transformations which have made BLAS such a central component of linear algebra computations.

The current/prior version of BTO already handles loop fusion, memory bandwidth constraints, and more. However, it is not currently aware any high-level algebraic laws such as the fact that matrix multiplication is associative, addition is associative and commutative, transposition reverse-distributes over multiplication, etc. My goal is to make it aware of these sort of things.

Along the way, one thing to do is solve the chain multiplication problem: given an expression like ∏[x1,x2...xN] figure out the most efficient associativity for implementing it via binary multiplication. The standard solution is to use a CKY-like dynamic programming algorithm to construct a tree covering the sequence [x1,x2...xN]. This is easy to implement, but it takes O(n^3) time and O(n^2) space.

I found a delicious alternative algorithm which solves the problem in O(n*log n) time and O(n) space! The key to this algorithm is to view the problem as determining a triangulation of convex polygons. That is, we can view [x0,x1,x2...xN] as the edges of a convex polygon, where x0 is the result of computing ∏[x1,x2...xN]. This amazing algorithm is described in the tech report by Hu and Shing (1981a), which includes a reference implementation in Pascal. Unfortunately the TR contains a number of typos and typesetting issues, but it's still pretty legible. A cleaner version of Part I is available here. And pay-walled presumably-cleaner versions of Part I and Part II are available from SIAM.

Hu and Shing (1981b) also have an algorithm which is simpler to implement and returns a heuristic answer in O(n) time, with the error ratio bounded by 15%. So if compile times are more important than running times, then you can use this version as well. A pay-walled version of the article is available from Elsevier.

comments

Solving Chain Multiplication

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112