Python API Reference¶

ProGraML is a graph-based program representation for data flow analysis and compiler optimizations.

The API is divided into three types of operations: graph creation, graph transformation, and graph serialization, all available under the programl namespace.

ProGraML was first described in this this paper:

Cummins, C., Fisches, Z., Ben-Nun, T., Hoefler, T., O’Boyle, M., and Leather, H. “ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.” In 38th International Conference on Machine Learning (ICML).

Document contents:

Graph Creation Ops
- LLVM / Clang
- XLA
Graph Transform Ops
- DGL
- NetworkX
- Graphviz
- JSON
Graph Serialization
- File
- Byte Array
- String

Graph Creation Ops ¶

Graph creation operations are used to construct Program Graphs from source code or compiler intermediate representations (IRs).

programl.from_cpp(srcs: Union[str, Iterable[str]], copts: Optional[List[str]] = None, system_includes: bool = True, language: str = 'c++', version: str = '10', timeout=300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]¶

Construct a Program Graph from a string of C/C++ code.

This is a convenience function for generating graphs of simple single-file code snippets. For example:

>>> programl.from_cpp(""" ... #include <stdio.h>
...
... int main() {
...   printf("Hello, ProGraML!");
...   return 0;
... }
... """)

This is equivalent to invoking clang with input over stdin:

cat <<EOF | clang -xc++ - -c -o -
#include <stdio.h>

int main() {
    printf("Hello, ProGraML!");
    return 0;
}
EOF

For more control over the clang invocation, see from_clang().

Parameters

srcs – A string of C / C++, or an iterable sequence of strings of C / C++.
copts – A list of additional command line arguments to pass to clang.
system_includes – Detect and pass -isystem arguments to clang using the default search path of the system compiler. See get_system_includes() for details.
language – The programming language of srcs. Must be either c++ or c.
version – The version of clang to use. See programl.CLANG_VERSIONS for a list of available versions.
timeout – The maximum number of seconds to wait for an individual clang invocation before raising an error. If multiple srcs inputs are provided, this timeout is per-input.
executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.
chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.

Returns

If srcs is singular, returns a single programl.ProgramGraph instance. Else returns a generator over programl.ProgramGraph instances.

Raises

UnsupportedCompiler – If the requested compiler version is not supported.
GraphCreationError – If compilation of the input fails.
TimeoutError – If the specified timeout is reached.

programl.from_clang(args: Union[List[str], Iterable[List[str]]], system_includes: bool = True, version: str = '10', timeout=300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]¶

Run clang and construct a Program Graph from the output.

Example usage:

>>> programl.from_clang(["/path/to/my/app.c", "-DMY_MACRO=3"])

This is equivalent to invoking clang as:

clang -c /path/to/my/app.c -DMY_MACRO=3

Multiple inputs can be passed in a single invocation to be batched and processed in parallel. For example:

>>> with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor:
...     programl.from_clang(
...         ["a.cc", "-DMY_MACRO=3"],
...         ["b.cpp"],
...         ["c.c", "-O3", "-std=c99"],
...         executor=executor,
...     )

Parameters

args – A list of arguments to pass to clang, or an iterable sequence of arguments to pass to clang.
system_includes – Detect and pass -isystem arguments to clang using the default search path of the system compiler. See get_system_includes() for details.
version – The version of clang to use. See programl.CLANG_VERSIONS for a list of available versions.
timeout – The maximum number of seconds to wait for an individual clang invocation before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.
chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.

Returns

If args is a single list of arguments, returns a single programl.ProgramGraph instance. Else returns a generator over programl.ProgramGraph instances.

Raises

UnsupportedCompiler – If the requested compiler version is not supported.
GraphCreationError – If compilation of the input fails.
TimeoutError – If the specified timeout is reached.

programl.from_llvm_ir(irs: Union[str, Iterable[str]], timeout=300, version: str = '10', executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]¶

Construct a Program Graph from a string of LLVM-IR.

This takes as input one or more LLVM-IR strings as generated by llvm-dis from a bitcode file, or from clang using arguments: -emit-llvm -S.

Example usage:

>>> programl.from_llvm_ir("""
... source_filename = "-"
... target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
... target triple = "x86_64-apple-macosx11.0.0"
...
... ; ...
... """)

Multiple inputs can be passed in a single invocation to be batched and processed in parallel. For example:

>>> with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor:
...     graphs = programl.from_llvm_ir(llvm_ir_strings, executor=executor)

Parameters

irs – A string of LLVM-IR, or an iterable sequence of LLVM-IR strings.
version – The version of LLVM to use. See programl.LLVM_VERSIONS for a list of available versions.
timeout – The maximum number of seconds to wait for an individual graph construction invocation before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.
chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.

Returns

If irs is a single IR, returns a single programl.ProgramGraph instance. Else returns a generator over programl.ProgramGraph instances.

Raises

UnsupportedCompiler – If the requested LLVM version is not supported.
GraphCreationError – If graph construction fails.
TimeoutError – If the specified timeout is reached.

programl.util.py.cc_system_includes.get_system_includes() → List[pathlib.Path][source]¶

Determine the system include paths for C/C++ compilation jobs.

This uses the system compiler to determine the search paths for C/C++ system headers. By default, c++ is invoked. This can be overridden by setting os.environ["CXX"].

Returns: A list of paths to system header directories.
Raises: OSError – If the compiler fails, or if the search paths cannot be determined.

XLA ¶

programl.from_xla_hlo_proto(hlos: Union[programl.third_party.tensorflow.xla_pb2.HloProto, Iterable[programl.third_party.tensorflow.xla_pb2.HloProto]], timeout=300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]¶

Construct a Program Graph from an XLA HLO protocol buffer.

Parameters

hlos – A HloProto, or an iterable sequence of HloProto instances.
timeout – The maximum number of seconds to wait for an individual graph construction invocation before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.
chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.

Returns

If hlos is a single input, returns a single programl.ProgramGraph instance. Else returns a generator over programl.ProgramGraph instances.

Raises

GraphCreationError – If graph construction fails.
TimeoutError – If the specified timeout is reached.

exception programl.GraphCreationError[source]¶: Exception raised if a graph creation op fails.

exception programl.UnsupportedCompiler[source]¶: Exception raised if the requested compiler is not supported.

Graph Transform Ops ¶

The graph transform ops are used to modify or convert Program Graphs to another representation.

DGL ¶

programl.to_dgl(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[dgl.heterograph.DGLHeteroGraph, Iterable[dgl.heterograph.DGLHeteroGraph]][source]¶

Convert one or more Program Graphs to DGLGraphs.

Parameters

graphs – A Program Graph, or a sequence of Program Graphs.
timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.
chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.

Returns

If a single input is provided, return a single dgl.DGLGraph. Else returns an iterable sequence of dgl.DGLGraph instances.

Raises

GraphTransformError – If graph conversion fails.
TimeoutError – If the specified timeout is reached.

NetworkX ¶

programl.to_networkx(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[networkx.classes.multidigraph.MultiDiGraph, Iterable[networkx.classes.multidigraph.MultiDiGraph]][source]¶

Convert one or more Program Graphs to NetworkX MultiDiGraphs.

Parameters

graphs – A Program Graph, or a sequence of Program Graphs.
timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.
chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.

Returns

If a single input is provided, return a single nx.MultiDiGraph. Else returns an iterable sequence of nx.MultiDiGraph instances.

Raises

GraphTransformError – If graph conversion fails.
TimeoutError – If the specified timeout is reached.

Graphviz ¶

programl.to_dot(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[str, Iterable[str]][source]¶

Convert one or more Program Graphs to DOT Graph Description Language.

This produces a DOT source string representing the input graph. This can then be rendered using the graphviz command line tools, or parsed using pydot.

Parameters

graphs – A Program Graph, or a sequence of Program Graphs.
timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.
chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.

Returns

A graphviz dot string when a single input is provided, else an iterable sequence of graphviz dot strings.

Raises

GraphTransformError – If graph conversion fails.
TimeoutError – If the specified timeout is reached.

JSON ¶

programl.to_json(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[Dict[str, Any], Iterable[Dict[str, Any]]][source]¶

Convert one or more Program Graphs to JSON node-link data.

Parameters

graphs – A Program Graph, or a sequence of Program Graphs.
timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.
chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.

Returns

If a single input is provided, return a single JSON dictionary. Else returns an iterable sequence of JSON dictionaries.

Raises

GraphTransformError – If graph conversion fails.
TimeoutError – If the specified timeout is reached.

exception programl.GraphTransformError[source]¶: Exception raised if a graph transform op fails.

Graph Serialization ¶

Graph serialization ops are used for storing or transferring Program Graphs.

File ¶

programl.save_graphs(path: pathlib.Path, graphs: Iterable[programl.proto.program_graph_pb2.ProgramGraph], compression: Optional[str] = 'gz') → None[source]¶

Save a sequence of program graphs to a file.

Parameters

path – The file to write.
graphs – A sequence of Program Graphs.
compression – Either gz for GZip compression (the default), or None for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.

Raises

TypeError – If an unsupported compression is given.

programl.load_graphs(path: pathlib.Path, idx_list: Optional[List[int]] = None, compression: Optional[str] = 'gz') → List[programl.proto.program_graph_pb2.ProgramGraph][source]¶

Load program graphs from a file.

Parameters

path – The file to read from.
idx_list – A zero-based list of graph indices to return. If not provided, all graphs are loaded.
compression – Either gz for GZip compression (the default), or None for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.

Returns

A sequence of Program Graphs.

Raises

TypeError – If an unsupported compression is given.
GraphCreationError – If deserialization fails.

Byte Array ¶

programl.to_bytes(graphs: Iterable[programl.proto.program_graph_pb2.ProgramGraph], compression: Optional[str] = 'gz') → bytes[source]¶

Serialize a sequence of Program Graphs to a byte array.

Parameters

graphs – A sequence of Program Graphs.
compression – Either gz for GZip compression (the default), or None for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.

Returns

The serialized program graphs.

programl.from_bytes(data: bytes, idx_list: Optional[List[int]] = None, compression: Optional[str] = 'gz') → List[programl.proto.program_graph_pb2.ProgramGraph][source]¶

Deserialize Program Graphs from a byte array.

Parameters

data – The serialized Program Graphs.
idx_list – A zero-based list of graph indices to return. If not provided, all graphs are returned.
compression – Either gz for GZip compression (the default), or None for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.

Returns

A list of Program Graphs.

Raises

GraphCreationError – If deserialization fails.

String ¶

programl.to_string(graphs: Iterable[programl.proto.program_graph_pb2.ProgramGraph]) → str[source]¶

Serialize a sequence of Program Graphs to a human-readable string.

The generated string has a JSON-like syntax that is designed for human readability. This is the least compact form of serialization.

Parameters: graphs – A sequence of Program Graphs.
Returns: The serialized program graphs.

programl.from_string(string: str, idx_list: Optional[List[int]] = None) → List[programl.proto.program_graph_pb2.ProgramGraph][source]¶

Deserialize Program Graphs from a human-readable string.

Parameters

data – The serialized Program Graphs.
idx_list – A zero-based list of graph indices to return. If not provided, all graphs are returned.

Returns

A list of Program Graphs.

Raises

GraphCreationError – If deserialization fails.