Python API Reference

ProGraML is a graph-based program representation for data flow analysis and compiler optimizations.

The API is divided into three types of operations: graph creation, graph transformation, and graph serialization, all available under the programl namespace.

ProGraML was first described in this this paper:

Cummins, C., Fisches, Z., Ben-Nun, T., Hoefler, T., O’Boyle, M., and Leather, H. “ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.” In 38th International Conference on Machine Learning (ICML).

Graph Creation Ops

Graph creation operations are used to construct Program Graphs from source code or compiler intermediate representations (IRs).

LLVM / Clang

programl.from_cpp(srcs: Union[str, Iterable[str]], copts: Optional[List[str]] = None, system_includes: bool = True, language: str = 'c++', version: str = '10', timeout=300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None)Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]

Construct a Program Graph from a string of C/C++ code.

This is a convenience function for generating graphs of simple single-file code snippets. For example:

>>> programl.from_cpp(""" ... #include <stdio.h>
...
... int main() {
...   printf("Hello, ProGraML!");
...   return 0;
... }
... """)

This is equivalent to invoking clang with input over stdin:

cat <<EOF | clang -xc++ - -c -o -
#include <stdio.h>

int main() {
    printf("Hello, ProGraML!");
    return 0;
}
EOF

For more control over the clang invocation, see from_clang().

Parameters
  • srcs – A string of C / C++, or an iterable sequence of strings of C / C++.

  • copts – A list of additional command line arguments to pass to clang.

  • system_includes – Detect and pass -isystem arguments to clang using the default search path of the system compiler. See get_system_includes() for details.

  • language – The programming language of srcs. Must be either c++ or c.

  • version – The version of clang to use. See programl.CLANG_VERSIONS for a list of available versions.

  • timeout – The maximum number of seconds to wait for an individual clang invocation before raising an error. If multiple srcs inputs are provided, this timeout is per-input.

  • executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.

  • chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.

Returns

If srcs is singular, returns a single programl.ProgramGraph instance. Else returns a generator over programl.ProgramGraph instances.

Raises
  • UnsupportedCompiler – If the requested compiler version is not supported.

  • GraphCreationError – If compilation of the input fails.

  • TimeoutError – If the specified timeout is reached.

programl.from_clang(args: Union[List[str], Iterable[List[str]]], system_includes: bool = True, version: str = '10', timeout=300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None)Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]

Run clang and construct a Program Graph from the output.

Example usage:

>>> programl.from_clang(["/path/to/my/app.c", "-DMY_MACRO=3"])

This is equivalent to invoking clang as:

clang -c /path/to/my/app.c -DMY_MACRO=3

Multiple inputs can be passed in a single invocation to be batched and processed in parallel. For example:

>>> with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor:
...     programl.from_clang(
...         ["a.cc", "-DMY_MACRO=3"],
...         ["b.cpp"],
...         ["c.c", "-O3", "-std=c99"],
...         executor=executor,
...     )
Parameters
  • args – A list of arguments to pass to clang, or an iterable sequence of arguments to pass to clang.

  • system_includes – Detect and pass -isystem arguments to clang using the default search path of the system compiler. See get_system_includes() for details.

  • version – The version of clang to use. See programl.CLANG_VERSIONS for a list of available versions.

  • timeout – The maximum number of seconds to wait for an individual clang invocation before raising an error. If multiple inputs are provided, this timeout is per-input.

  • executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.

  • chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.

Returns

If args is a single list of arguments, returns a single programl.ProgramGraph instance. Else returns a generator over programl.ProgramGraph instances.

Raises
  • UnsupportedCompiler – If the requested compiler version is not supported.

  • GraphCreationError – If compilation of the input fails.

  • TimeoutError – If the specified timeout is reached.

programl.from_llvm_ir(irs: Union[str, Iterable[str]], timeout=300, version: str = '10', executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None)Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]

Construct a Program Graph from a string of LLVM-IR.

This takes as input one or more LLVM-IR strings as generated by llvm-dis from a bitcode file, or from clang using arguments: -emit-llvm -S.

Example usage:

>>> programl.from_llvm_ir("""
... source_filename = "-"
... target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
... target triple = "x86_64-apple-macosx11.0.0"
...
... ; ...
... """)

Multiple inputs can be passed in a single invocation to be batched and processed in parallel. For example:

>>> with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor:
...     graphs = programl.from_llvm_ir(llvm_ir_strings, executor=executor)
Parameters
  • irs – A string of LLVM-IR, or an iterable sequence of LLVM-IR strings.

  • version – The version of LLVM to use. See programl.LLVM_VERSIONS for a list of available versions.

  • timeout – The maximum number of seconds to wait for an individual graph construction invocation before raising an error. If multiple inputs are provided, this timeout is per-input.

  • executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.

  • chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.

Returns

If irs is a single IR, returns a single programl.ProgramGraph instance. Else returns a generator over programl.ProgramGraph instances.

Raises
programl.util.py.cc_system_includes.get_system_includes()List[pathlib.Path][source]

Determine the system include paths for C/C++ compilation jobs.

This uses the system compiler to determine the search paths for C/C++ system headers. By default, c++ is invoked. This can be overridden by setting os.environ["CXX"].

Returns

A list of paths to system header directories.

Raises

OSError – If the compiler fails, or if the search paths cannot be determined.

XLA

programl.from_xla_hlo_proto(hlos: Union[programl.third_party.tensorflow.xla_pb2.HloProto, Iterable[programl.third_party.tensorflow.xla_pb2.HloProto]], timeout=300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None)Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]

Construct a Program Graph from an XLA HLO protocol buffer.

Parameters
  • hlos – A HloProto, or an iterable sequence of HloProto instances.

  • timeout – The maximum number of seconds to wait for an individual graph construction invocation before raising an error. If multiple inputs are provided, this timeout is per-input.

  • executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.

  • chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.

Returns

If hlos is a single input, returns a single programl.ProgramGraph instance. Else returns a generator over programl.ProgramGraph instances.

Raises
  • GraphCreationError – If graph construction fails.

  • TimeoutError – If the specified timeout is reached.

exception programl.GraphCreationError[source]

Exception raised if a graph creation op fails.

exception programl.UnsupportedCompiler[source]

Exception raised if the requested compiler is not supported.

Graph Transform Ops

The graph transform ops are used to modify or convert Program Graphs to another representation.

DGL

programl.to_dgl(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None)Union[dgl.heterograph.DGLHeteroGraph, Iterable[dgl.heterograph.DGLHeteroGraph]][source]

Convert one or more Program Graphs to DGLGraphs.

Parameters
  • graphs – A Program Graph, or a sequence of Program Graphs.

  • timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.

  • executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.

  • chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.

Returns

If a single input is provided, return a single dgl.DGLGraph. Else returns an iterable sequence of dgl.DGLGraph instances.

Raises
  • GraphTransformError – If graph conversion fails.

  • TimeoutError – If the specified timeout is reached.

NetworkX

programl.to_networkx(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None)Union[networkx.classes.multidigraph.MultiDiGraph, Iterable[networkx.classes.multidigraph.MultiDiGraph]][source]

Convert one or more Program Graphs to NetworkX MultiDiGraphs.

Parameters
  • graphs – A Program Graph, or a sequence of Program Graphs.

  • timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.

  • executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.

  • chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.

Returns

If a single input is provided, return a single nx.MultiDiGraph. Else returns an iterable sequence of nx.MultiDiGraph instances.

Raises
  • GraphTransformError – If graph conversion fails.

  • TimeoutError – If the specified timeout is reached.

Graphviz

programl.to_dot(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None)Union[str, Iterable[str]][source]

Convert one or more Program Graphs to DOT Graph Description Language.

This produces a DOT source string representing the input graph. This can then be rendered using the graphviz command line tools, or parsed using pydot.

Parameters
  • graphs – A Program Graph, or a sequence of Program Graphs.

  • timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.

  • executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.

  • chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.

Returns

A graphviz dot string when a single input is provided, else an iterable sequence of graphviz dot strings.

Raises
  • GraphTransformError – If graph conversion fails.

  • TimeoutError – If the specified timeout is reached.

JSON

programl.to_json(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None)Union[Dict[str, Any], Iterable[Dict[str, Any]]][source]

Convert one or more Program Graphs to JSON node-link data.

Parameters
  • graphs – A Program Graph, or a sequence of Program Graphs.

  • timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.

  • executor – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor. Defaults to single threaded execution. This is only used when multiple inputs are given.

  • chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.

Returns

If a single input is provided, return a single JSON dictionary. Else returns an iterable sequence of JSON dictionaries.

Raises
  • GraphTransformError – If graph conversion fails.

  • TimeoutError – If the specified timeout is reached.

exception programl.GraphTransformError[source]

Exception raised if a graph transform op fails.

Graph Serialization

Graph serialization ops are used for storing or transferring Program Graphs.

File

programl.save_graphs(path: pathlib.Path, graphs: Iterable[programl.proto.program_graph_pb2.ProgramGraph], compression: Optional[str] = 'gz')None[source]

Save a sequence of program graphs to a file.

Parameters
  • path – The file to write.

  • graphs – A sequence of Program Graphs.

  • compression – Either gz for GZip compression (the default), or None for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.

Raises

TypeError – If an unsupported compression is given.

programl.load_graphs(path: pathlib.Path, idx_list: Optional[List[int]] = None, compression: Optional[str] = 'gz')List[programl.proto.program_graph_pb2.ProgramGraph][source]

Load program graphs from a file.

Parameters
  • path – The file to read from.

  • idx_list – A zero-based list of graph indices to return. If not provided, all graphs are loaded.

  • compression – Either gz for GZip compression (the default), or None for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.

Returns

A sequence of Program Graphs.

Raises
  • TypeError – If an unsupported compression is given.

  • GraphCreationError – If deserialization fails.

Byte Array

programl.to_bytes(graphs: Iterable[programl.proto.program_graph_pb2.ProgramGraph], compression: Optional[str] = 'gz')bytes[source]

Serialize a sequence of Program Graphs to a byte array.

Parameters
  • graphs – A sequence of Program Graphs.

  • compression – Either gz for GZip compression (the default), or None for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.

Returns

The serialized program graphs.

programl.from_bytes(data: bytes, idx_list: Optional[List[int]] = None, compression: Optional[str] = 'gz')List[programl.proto.program_graph_pb2.ProgramGraph][source]

Deserialize Program Graphs from a byte array.

Parameters
  • data – The serialized Program Graphs.

  • idx_list – A zero-based list of graph indices to return. If not provided, all graphs are returned.

  • compression – Either gz for GZip compression (the default), or None for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.

Returns

A list of Program Graphs.

Raises

GraphCreationError – If deserialization fails.

String

programl.to_string(graphs: Iterable[programl.proto.program_graph_pb2.ProgramGraph])str[source]

Serialize a sequence of Program Graphs to a human-readable string.

The generated string has a JSON-like syntax that is designed for human readability. This is the least compact form of serialization.

Parameters

graphs – A sequence of Program Graphs.

Returns

The serialized program graphs.

programl.from_string(string: str, idx_list: Optional[List[int]] = None)List[programl.proto.program_graph_pb2.ProgramGraph][source]

Deserialize Program Graphs from a human-readable string.

Parameters
  • data – The serialized Program Graphs.

  • idx_list – A zero-based list of graph indices to return. If not provided, all graphs are returned.

Returns

A list of Program Graphs.

Raises

GraphCreationError – If deserialization fails.