Python API Reference¶
ProGraML is a graph-based program representation for data flow analysis and compiler optimizations.
The API is divided into three types of operations: graph creation, graph
transformation, and graph serialization, all available under the
programl
namespace.
ProGraML was first described in this this paper:
Cummins, C., Fisches, Z., Ben-Nun, T., Hoefler, T., O’Boyle, M., and Leather, H. “ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.” In 38th International Conference on Machine Learning (ICML).
Document contents:
Graph Creation Ops¶
Graph creation operations are used to construct Program Graphs from source code or compiler intermediate representations (IRs).
LLVM / Clang¶
- programl.from_cpp(srcs: Union[str, Iterable[str]], copts: Optional[List[str]] = None, system_includes: bool = True, language: str = 'c++', version: str = '10', timeout=300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]¶
Construct a Program Graph from a string of C/C++ code.
This is a convenience function for generating graphs of simple single-file code snippets. For example:
>>> programl.from_cpp(""" ... #include <stdio.h> ... ... int main() { ... printf("Hello, ProGraML!"); ... return 0; ... } ... """)
This is equivalent to invoking clang with input over stdin:
cat <<EOF | clang -xc++ - -c -o - #include <stdio.h> int main() { printf("Hello, ProGraML!"); return 0; } EOF
For more control over the clang invocation, see
from_clang()
.- Parameters
srcs – A string of C / C++, or an iterable sequence of strings of C / C++.
copts – A list of additional command line arguments to pass to clang.
system_includes – Detect and pass
-isystem
arguments to clang using the default search path of the system compiler. Seeget_system_includes()
for details.language – The programming language of
srcs
. Must be eitherc++
orc
.version – The version of clang to use. See
programl.CLANG_VERSIONS
for a list of available versions.timeout – The maximum number of seconds to wait for an individual clang invocation before raising an error. If multiple
srcs
inputs are provided, this timeout is per-input.executor – An executor object, with method
submit(callable, *args, **kwargs)
and returning a Future-like object with methodsdone() -> bool
andresult() -> float
. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg:concurrent.futures.ThreadPoolExecutor
. Defaults to single threaded execution. This is only used when multiple inputs are given.chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.
- Returns
If
srcs
is singular, returns a singleprograml.ProgramGraph
instance. Else returns a generator overprograml.ProgramGraph
instances.- Raises
UnsupportedCompiler – If the requested compiler version is not supported.
GraphCreationError – If compilation of the input fails.
TimeoutError – If the specified timeout is reached.
- programl.from_clang(args: Union[List[str], Iterable[List[str]]], system_includes: bool = True, version: str = '10', timeout=300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]¶
Run clang and construct a Program Graph from the output.
Example usage:
>>> programl.from_clang(["/path/to/my/app.c", "-DMY_MACRO=3"])
This is equivalent to invoking clang as:
clang -c /path/to/my/app.c -DMY_MACRO=3
Multiple inputs can be passed in a single invocation to be batched and processed in parallel. For example:
>>> with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor: ... programl.from_clang( ... ["a.cc", "-DMY_MACRO=3"], ... ["b.cpp"], ... ["c.c", "-O3", "-std=c99"], ... executor=executor, ... )
- Parameters
args – A list of arguments to pass to clang, or an iterable sequence of arguments to pass to clang.
system_includes – Detect and pass
-isystem
arguments to clang using the default search path of the system compiler. Seeget_system_includes()
for details.version – The version of clang to use. See
programl.CLANG_VERSIONS
for a list of available versions.timeout – The maximum number of seconds to wait for an individual clang invocation before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method
submit(callable, *args, **kwargs)
and returning a Future-like object with methodsdone() -> bool
andresult() -> float
. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg:concurrent.futures.ThreadPoolExecutor
. Defaults to single threaded execution. This is only used when multiple inputs are given.chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.
- Returns
If
args
is a single list of arguments, returns a singleprograml.ProgramGraph
instance. Else returns a generator overprograml.ProgramGraph
instances.- Raises
UnsupportedCompiler – If the requested compiler version is not supported.
GraphCreationError – If compilation of the input fails.
TimeoutError – If the specified timeout is reached.
- programl.from_llvm_ir(irs: Union[str, Iterable[str]], timeout=300, version: str = '10', executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]¶
Construct a Program Graph from a string of LLVM-IR.
This takes as input one or more LLVM-IR strings as generated by
llvm-dis
from a bitcode file, or fromclang
using arguments:-emit-llvm -S
.Example usage:
>>> programl.from_llvm_ir(""" ... source_filename = "-" ... target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" ... target triple = "x86_64-apple-macosx11.0.0" ... ... ; ... ... """)
Multiple inputs can be passed in a single invocation to be batched and processed in parallel. For example:
>>> with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor: ... graphs = programl.from_llvm_ir(llvm_ir_strings, executor=executor)
- Parameters
irs – A string of LLVM-IR, or an iterable sequence of LLVM-IR strings.
version – The version of LLVM to use. See
programl.LLVM_VERSIONS
for a list of available versions.timeout – The maximum number of seconds to wait for an individual graph construction invocation before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method
submit(callable, *args, **kwargs)
and returning a Future-like object with methodsdone() -> bool
andresult() -> float
. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg:concurrent.futures.ThreadPoolExecutor
. Defaults to single threaded execution. This is only used when multiple inputs are given.chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.
- Returns
If
irs
is a single IR, returns a singleprograml.ProgramGraph
instance. Else returns a generator overprograml.ProgramGraph
instances.- Raises
UnsupportedCompiler – If the requested LLVM version is not supported.
GraphCreationError – If graph construction fails.
TimeoutError – If the specified timeout is reached.
- programl.util.py.cc_system_includes.get_system_includes() → List[pathlib.Path][source]¶
Determine the system include paths for C/C++ compilation jobs.
This uses the system compiler to determine the search paths for C/C++ system headers. By default,
c++
is invoked. This can be overridden by settingos.environ["CXX"]
.- Returns
A list of paths to system header directories.
- Raises
OSError – If the compiler fails, or if the search paths cannot be determined.
XLA¶
- programl.from_xla_hlo_proto(hlos: Union[programl.third_party.tensorflow.xla_pb2.HloProto, Iterable[programl.third_party.tensorflow.xla_pb2.HloProto]], timeout=300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]][source]¶
Construct a Program Graph from an XLA HLO protocol buffer.
- Parameters
hlos – A
HloProto
, or an iterable sequence ofHloProto
instances.timeout – The maximum number of seconds to wait for an individual graph construction invocation before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method
submit(callable, *args, **kwargs)
and returning a Future-like object with methodsdone() -> bool
andresult() -> float
. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg:concurrent.futures.ThreadPoolExecutor
. Defaults to single threaded execution. This is only used when multiple inputs are given.chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory.
- Returns
If
hlos
is a single input, returns a singleprograml.ProgramGraph
instance. Else returns a generator overprograml.ProgramGraph
instances.- Raises
GraphCreationError – If graph construction fails.
TimeoutError – If the specified timeout is reached.
Graph Transform Ops¶
The graph transform ops are used to modify or convert Program Graphs to another representation.
DGL¶
- programl.to_dgl(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[dgl.heterograph.DGLHeteroGraph, Iterable[dgl.heterograph.DGLHeteroGraph]][source]¶
Convert one or more Program Graphs to DGLGraphs.
- Parameters
graphs – A Program Graph, or a sequence of Program Graphs.
timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method
submit(callable, *args, **kwargs)
and returning a Future-like object with methodsdone() -> bool
andresult() -> float
. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg:concurrent.futures.ThreadPoolExecutor
. Defaults to single threaded execution. This is only used when multiple inputs are given.chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.
- Returns
If a single input is provided, return a single
dgl.DGLGraph
. Else returns an iterable sequence ofdgl.DGLGraph
instances.- Raises
GraphTransformError – If graph conversion fails.
TimeoutError – If the specified timeout is reached.
NetworkX¶
- programl.to_networkx(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[networkx.classes.multidigraph.MultiDiGraph, Iterable[networkx.classes.multidigraph.MultiDiGraph]][source]¶
Convert one or more Program Graphs to NetworkX MultiDiGraphs.
- Parameters
graphs – A Program Graph, or a sequence of Program Graphs.
timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method
submit(callable, *args, **kwargs)
and returning a Future-like object with methodsdone() -> bool
andresult() -> float
. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg:concurrent.futures.ThreadPoolExecutor
. Defaults to single threaded execution. This is only used when multiple inputs are given.chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.
- Returns
If a single input is provided, return a single
nx.MultiDiGraph
. Else returns an iterable sequence ofnx.MultiDiGraph
instances.- Raises
GraphTransformError – If graph conversion fails.
TimeoutError – If the specified timeout is reached.
Graphviz¶
- programl.to_dot(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[str, Iterable[str]][source]¶
Convert one or more Program Graphs to DOT Graph Description Language.
This produces a DOT source string representing the input graph. This can then be rendered using the graphviz command line tools, or parsed using pydot.
- Parameters
graphs – A Program Graph, or a sequence of Program Graphs.
timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method
submit(callable, *args, **kwargs)
and returning a Future-like object with methodsdone() -> bool
andresult() -> float
. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg:concurrent.futures.ThreadPoolExecutor
. Defaults to single threaded execution. This is only used when multiple inputs are given.chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.
- Returns
A graphviz dot string when a single input is provided, else an iterable sequence of graphviz dot strings.
- Raises
GraphTransformError – If graph conversion fails.
TimeoutError – If the specified timeout is reached.
JSON¶
- programl.to_json(graphs: Union[programl.proto.program_graph_pb2.ProgramGraph, Iterable[programl.proto.program_graph_pb2.ProgramGraph]], timeout: int = 300, executor: Optional[programl.util.py.executor.ExecutorLike] = None, chunksize: Optional[int] = None) → Union[Dict[str, Any], Iterable[Dict[str, Any]]][source]¶
Convert one or more Program Graphs to JSON node-link data.
- Parameters
graphs – A Program Graph, or a sequence of Program Graphs.
timeout – The maximum number of seconds to wait for an individual graph conversion before raising an error. If multiple inputs are provided, this timeout is per-input.
executor – An executor object, with method
submit(callable, *args, **kwargs)
and returning a Future-like object with methodsdone() -> bool
andresult() -> float
. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg:concurrent.futures.ThreadPoolExecutor
. Defaults to single threaded execution. This is only used when multiple inputs are given.chunksize – The number of inputs to read and process at a time. A larger chunksize improves parallelism but increases memory consumption as more inputs must be stored in memory. This is only used when multiple inputs are given.
- Returns
If a single input is provided, return a single JSON dictionary. Else returns an iterable sequence of JSON dictionaries.
- Raises
GraphTransformError – If graph conversion fails.
TimeoutError – If the specified timeout is reached.
Graph Serialization¶
Graph serialization ops are used for storing or transferring Program Graphs.
File¶
- programl.save_graphs(path: pathlib.Path, graphs: Iterable[programl.proto.program_graph_pb2.ProgramGraph], compression: Optional[str] = 'gz') → None[source]¶
Save a sequence of program graphs to a file.
- Parameters
path – The file to write.
graphs – A sequence of Program Graphs.
compression – Either
gz
for GZip compression (the default), orNone
for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.
- Raises
TypeError – If an unsupported
compression
is given.
- programl.load_graphs(path: pathlib.Path, idx_list: Optional[List[int]] = None, compression: Optional[str] = 'gz') → List[programl.proto.program_graph_pb2.ProgramGraph][source]¶
Load program graphs from a file.
- Parameters
path – The file to read from.
idx_list – A zero-based list of graph indices to return. If not provided, all graphs are loaded.
compression – Either
gz
for GZip compression (the default), orNone
for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.
- Returns
A sequence of Program Graphs.
- Raises
TypeError – If an unsupported
compression
is given.GraphCreationError – If deserialization fails.
Byte Array¶
- programl.to_bytes(graphs: Iterable[programl.proto.program_graph_pb2.ProgramGraph], compression: Optional[str] = 'gz') → bytes[source]¶
Serialize a sequence of Program Graphs to a byte array.
- Parameters
graphs – A sequence of Program Graphs.
compression – Either
gz
for GZip compression (the default), orNone
for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.
- Returns
The serialized program graphs.
- programl.from_bytes(data: bytes, idx_list: Optional[List[int]] = None, compression: Optional[str] = 'gz') → List[programl.proto.program_graph_pb2.ProgramGraph][source]¶
Deserialize Program Graphs from a byte array.
- Parameters
data – The serialized Program Graphs.
idx_list – A zero-based list of graph indices to return. If not provided, all graphs are returned.
compression – Either
gz
for GZip compression (the default), orNone
for no compression. Compression increases the cost of serializing and deserializing but can greatly reduce the size of the serialized graphs.
- Returns
A list of Program Graphs.
- Raises
GraphCreationError – If deserialization fails.
String¶
- programl.to_string(graphs: Iterable[programl.proto.program_graph_pb2.ProgramGraph]) → str[source]¶
Serialize a sequence of Program Graphs to a human-readable string.
The generated string has a JSON-like syntax that is designed for human readability. This is the least compact form of serialization.
- Parameters
graphs – A sequence of Program Graphs.
- Returns
The serialized program graphs.
- programl.from_string(string: str, idx_list: Optional[List[int]] = None) → List[programl.proto.program_graph_pb2.ProgramGraph][source]¶
Deserialize Program Graphs from a human-readable string.
- Parameters
data – The serialized Program Graphs.
idx_list – A zero-based list of graph indices to return. If not provided, all graphs are returned.
- Returns
A list of Program Graphs.
- Raises
GraphCreationError – If deserialization fails.