Protocol Buffers

ProGraML uses Protocol Buffers for representing the structured data of program graphs.

Document contents:

The Program Graph

These protocol buffer definitions are available in:

  • Python: from programl.proto import *

  • C++: #include "programl/proto/program_graph.pb.h"

struct ProgramGraph

A program graph.

Public Members

repeated Node node   = 1

The nodes of the program.

repeated Edge edge   = 2

The edge relations between nodes.

repeated Function function   = 4

The functions that are defined in the program.

repeated Module module   = 5

The modules defined in the program.

Features features = 6

A <key, value> mapping of graph-level features.

struct Node

A node represents an instruction, variable, or constant.

A conformant node must:

  • Have the type field set.

  • Have the text field set.

Public Types

enum Type

The type of node.

Values:

enumerator INSTRUCTION

An instruction.

enumerator VARIABLE

A variable.

enumerator CONSTANT

A constant.

Public Members

Type type = 1

The type of the node.

string text = 2

The text of a node.

This is the raw representation of a node, such as the contents of a statement, or the name of an identifier.

int32 function = 4

An index into the parent ProgramGraph message’s function list indicating the source Function for this node.

int32 block = 7

The basic block of this node.

For IRs with a basic block abstraction, this value can be used to group nodes by the basic block that they are defined in. This value is optional, and when set, is used to define an ID for the block. IDs should be unique across the entire program, i.e. when two nodes have the same block, they should also have the same Function.

Features features = 8

A <key, value> mapping of features for this node.

struct Edge

An edge is a relation between two Nodes in a ProgramGraph.

A conformant edge must:

  • Have the flow field set.

  • Have source and target field values that are indices into the parent ProgramGraph message’s Node list.

  • Have a position of zero if the flow is CALL.

Public Types

enum Flow

The edge flow type.

Values:

enumerator CONTROL

A control flow relation.

enumerator DATA

A data flow relation.

enumerator CALL

A call relation.

Public Members

Flow flow = 1

The type of relation of this edge.

int32 position = 2

A numeric position for this edge, used to differentiate, for example, multiple incoming data edges to an instruction order by their operand order.

int32 source = 3

An index into the parent ProgramGraph message’s node list for the source of this relation.

int32 target = 4

An index into the parent ProgramGraph message’s node list for the target of this relation.

Features features = 5

A <key, value> mapping of features for this edge.

struct Function

A function in a ProgramGraph.

A function contains one or more INSTRUCTION Nodes.

Public Members

string name = 1

The name of the function.

int32 module = 2

The source module of the function, as an index into the parent ProgramGraph message’s Module list.

Features features = 3

A <key, value> mapping of features for this function.

struct Module

A module in a ProgramGraph.

A module represents a logical grouping of functions within a ProgramGraph, usually equivalent to a Translation Unit.

Public Members

string name = 1

The name of the module.

Features features = 2

A <key, value> mapping of features for this module.

Features

These protocol buffer definitions are available in:

  • Python: from programl.proto import *

  • C++: #include "programl/third_party/tesnroflow/features.pb.h"

struct Feature

Containers for non-sequential data.

Public Members

oneof kind   = {     BytesList bytes_list = 1

Each feature can be exactly one kind.

struct FeatureList

Containers for sequential data.

A FeatureList contains lists of Features. These may hold zero or more Feature values.

FeatureLists are organized into categories by name. The FeatureLists message contains the mapping from name to FeatureList.

Public Members

repeated Feature feature   = 1

A list of Feature messages.

struct FeatureLists

Public Members

map<string, FeatureList> feature_list = 1

Map from feature name to feature list.

struct BytesList

Containers to hold repeated fundamental values.

Public Members

repeated bytes value   = 1

Features data.

struct Features

Public Members

map<string, Feature> feature = 1

Map from feature name to feature.

struct Features

Public Members

map<string, Feature> feature = 1

Map from feature name to feature.

Util

These protocol buffer definitions are available in:

  • Python: from programl.proto import *

  • C++: #include "programl/proto/util.pb.h"

struct ProgramGraphOptions

Options used to generate a program graph.

Public Members

bool strict = 3

If set, the program graph builder will reject graphs where:

  1. A module contains no nodes.

  2. A function contains no nodes.

  3. A node is unnconnected.

bool instructions_only = 1

Generate only nodes for instructions.

bool ignore_call_returns = 2

Omit return call edges from call sites.

int32 opt_level = 4

The optimization level when generating an IR from a source file.

string ir_path = 10

The path of an IR to read.

struct ProgramGraphList

A list of program graphs.

Public Members

Features context = 1

A <key, value> mapping of features for this ProgramGraph list.

repeated ProgramGraph graph   = 2

A list of ProgramGraph messages.

struct ProgramGraphFeatures

Features describing a program.

Public Members

FeatureLists node_features = 1

A list of features corresponding a ProgramGraph’s list of Node messages.

FeatureLists edge_features = 2

A list of features corresponding a ProgramGraph’s list of Edge messages.

FeatureLists function_features = 3

A list of features corresponding a ProgramGraph’s list of Function messages.

FeatureLists module_features = 4

A list of features corresponding a ProgramGraph’s list of Module messages.

Features features = 5

A set of graph-level features.

struct ProgramGraphFeaturesList

A list of program graphs.

Public Members

Features context = 1

A <key, value> mapping of features for this ProgramGraphFeatures list.

repeated ProgramGraphFeatures graph   = 2

A list of ProgramGraphFeatures messages.

struct Ir

A compiler intermediate representation.

Public Types

enum Type

The type of IR.

Values:

enumerator UNKNOWN
enumerator LLVM
enumerator XLA_HLO

Public Members

Type type = 1

The type of IR.

int64 compiler_version = 2

The compiler version, as a single integer.

Major + minor versions should be converted to this single number, e.g. 6.0.0 -> 600.

string cmd = 3

The command that was used to produce this IR.

string text = 4

The text of the IR.

struct IrList

A list of compiler IRs.

Public Members

repeated Ir ir   = 1

A list of Ir messages.

struct SourceFile

A source file.

Public Types

enum Language

The source programming language.

Values:

enumerator UNKNOWN
enumerator C
enumerator CXX
enumerator OPENCL
enumerator SWIFT
enumerator HASKELL
enumerator FORTRAN

Public Members

Language language = 2

The source programming language.

string relpath = 1

The relative path of the file.

string text = 3

The text of the file.

struct Repo

A repository of source files.

Public Members

string url = 1

The URL of the repository.

string sha1 = 2

The sha1 of the repository’s HEAD commit.

int64 created_ms_timestamp = 3

The timestamp that this repository was created.

struct NodeIndexList

A node map is used to translate node indices between ProgramGraph instances.

Public Members

repeated int32 node   = 1

The keys are node indices in the old graph representation, the values are node inidices in the new graph representation.