"test/vscode:/vscode.git/clone" did not exist on "0536f2b3123f6ad0a0c3598f97459a70a0a55fb0"
Unverified Commit cc098f4d authored by Umang Yadav's avatar Umang Yadav Committed by GitHub
Browse files

Add flag for tuning in migraphx-driver (#1519)

* Add driver flag "--exhaustive-tune" to enable tuning, add support for the same in C/C++ and python API
parent 102c6bdb
...@@ -137,7 +137,7 @@ An `argument <migraphx::argument>` can handle memory buffers from either the GPU ...@@ -137,7 +137,7 @@ An `argument <migraphx::argument>` can handle memory buffers from either the GPU
By default when running the `program <migraphx::program>`, buffers are allocated on the corresponding target. By default when running the `program <migraphx::program>`, buffers are allocated on the corresponding target.
When compiling for the CPU, the buffers by default will be allocated on the CPU. When compiling for the CPU, the buffers by default will be allocated on the CPU.
When compiling for the GPU, the buffers by default will be allocated on the GPU. When compiling for the GPU, the buffers by default will be allocated on the GPU.
With the option ``offloaf_copy=true`` set while compiling for the GPU, the buffers will be located on the CPU. With the option ``offload_copy=true`` set while compiling for the GPU, the buffers will be located on the CPU.
Importing From ONNX Importing From ONNX
......
...@@ -28,6 +28,10 @@ Enable implicit offload copying ...@@ -28,6 +28,10 @@ Enable implicit offload copying
Disable fast math optimization Disable fast math optimization
.. option:: --exhaustive-tune
Perform an exhaustive search to find the fastest version of generated kernels for selected backend
.. option:: --fp16 .. option:: --fp16
Quantize for fp16 Quantize for fp16
......
...@@ -215,13 +215,14 @@ program ...@@ -215,13 +215,14 @@ program
:rtype: list[shape] :rtype: list[shape]
.. py:method:: compile(t, offload_copy=True, fast_math=True) .. py:method:: compile(t, offload_copy=True, fast_math=True, exhaustive_tune=False)
Compiles the program for the target and optimizes it. Compiles the program for the target and optimizes it.
:param target t: This is the target to compile the program for. :param target t: This is the target to compile the program for.
:param bool offload_copy: For targets with offloaded memory(such as the gpu), this will insert instructions during compilation to copy the input parameters to the offloaded memory and to copy the final result from the offloaded memory back to main memory. :param bool offload_copy: For targets with offloaded memory(such as the gpu), this will insert instructions during compilation to copy the input parameters to the offloaded memory and to copy the final result from the offloaded memory back to main memory.
:param bool fast_math: Optimize math functions to use faster approximate versions. There may be slight accuracy degredation when enabled. :param bool fast_math: Optimize math functions to use faster approximate versions. There may be slight accuracy degredation when enabled.
:param exhaustive_tune: Flag to enable exhaustive search to find the fastest version of generated kernels for selected backend.
.. py:method:: get_main_module() .. py:method:: get_main_module()
......
...@@ -10,52 +10,53 @@ The MIGraphX driver is installed with MIGraphX and can be found in `/opt/rocm/bi ...@@ -10,52 +10,53 @@ The MIGraphX driver is installed with MIGraphX and can be found in `/opt/rocm/bi
See below for a comprehensive list of commands and option arguments, as well as some usage examples. See below for a comprehensive list of commands and option arguments, as well as some usage examples.
### Commands ### Commands
| Command | Description | | Command | Description |
| --- | ---| | ------- | -------------------------------------------------------------------------- |
| op | When followed by the option --list or -l, prints all operators of MIGraphX | | op | When followed by the option --list or -l, prints all operators of MIGraphX |
| params | Prints the input and output parameter shapes | | params | Prints the input and output parameter shapes |
| run | Compiles, allocates parameters, evaluates, and prints input graph | | run | Compiles, allocates parameters, evaluates, and prints input graph |
| read | Loads and prints input graph | | read | Loads and prints input graph |
| compile | Compiles and prints input graph | | compile | Compiles and prints input graph |
| verify | Runs reference and GPU implementations and checks outputs for consistency | | verify | Runs reference and GPU implementations and checks outputs for consistency |
| perf | Compiles and runs input graph then prints performance report | | perf | Compiles and runs input graph then prints performance report |
### Options ### Options
| Option | Description | | Option | Description |
| --- | --- | | ---------------------------------------- | --------------------------------------------------------- |
| --help \| -h | Show help | | --help \| -h | Show help |
| --model <resnet50\|inceptionv3\|alexnet> | Loads one of the three default models | | --model <resnet50\|inceptionv3\|alexnet> | Loads one of the three default models |
| --onnx | Load file as onnx graph | | --onnx | Load file as onnx graph |
| --tf | Load file as a tensorflow graph | | --tf | Load file as a tensorflow graph |
| --migraphx | Load file as a migraphx graph | | --migraphx | Load file as a migraphx graph |
| --migraphx-json | Load file as a migraphx JSON graph | | --migraphx-json | Load file as a migraphx JSON graph |
| --batch | Set batch size for the model | | --batch | Set batch size for the model |
| --nhwc | Treat tensorflow format as nhwc | | --nhwc | Treat tensorflow format as nhwc |
| --nchw | Treat tensorflow format as nchw | | --nchw | Treat tensorflow format as nchw |
| --skip-unknown-operators | Skip unknown operators when parsing and continue to parse | | --skip-unknown-operators | Skip unknown operators when parsing and continue to parse |
| --trim \| -t | Trim instructions from the end | | --trim \| -t | Trim instructions from the end |
| --optimize \| -O | Optimize when reading | | --optimize \| -O | Optimize when reading |
| --graphviz \| -g | Print out a graphviz representation | | --graphviz \| -g | Print out a graphviz representation |
| --brief | Make the output brief | | --brief | Make the output brief |
| --cpp | Print out the program as cpp program | | --cpp | Print out the program as cpp program |
| --json | Print out program as json | | --json | Print out program as json |
| --text | Print out program in text format | | --text | Print out program in text format |
| --binary | Print out program in binary format | | --binary | Print out program in binary format |
| --output \| -o | Output to file | | --output \| -o | Output to file |
| --fill0 | Fill parameter with 0s | | --fill0 | Fill parameter with 0s |
| --fill1 | Fill parameter with 1s | | --fill1 | Fill parameter with 1s |
| --gpu | Compile on the gpu | | --gpu | Compile on the gpu |
| --cpu | Compile on the cpu | | --cpu | Compile on the cpu |
| --ref | Compile on the reference implementation | | --ref | Compile on the reference implementation |
| --enable-offload-copy | Enable implicit offload copying | | --enable-offload-copy | Enable implicit offload copying |
| --disable-fast-math | Disable fast math optimization | | --disable-fast-math | Disable fast math optimization |
| --fp16 | Quantize for fp16 | | --exhaustive-tune | Enable exhaustive search to find fastest kernel |
| --int8 | Quantize for int8 | | --fp16 | Quantize for fp16 |
| --tolerance | Tolerance for errors | | --int8 | Quantize for int8 |
| --per-instruction \| -i | Verify each instruction | | --tolerance | Tolerance for errors |
| --reduce \| -r | Reduce program and verify | | --per-instruction \| -i | Verify each instruction |
| --iterations \| -n | Number of iterations to run for perf report | | --reduce \| -r | Reduce program and verify |
| --list \| -l | List all the operators of MIGraphX | | --iterations \| -n | Number of iterations to run for perf report |
| --list \| -l | List all the operators of MIGraphX |
## Usage Examples ## Usage Examples
The examples below supply a simple MNIST ConvNet as the input graph. Models of higher complexity will have considerably larger outputs in most cases. The examples below supply a simple MNIST ConvNet as the input graph. Models of higher complexity will have considerably larger outputs in most cases.
......
...@@ -134,6 +134,11 @@ void set_offload_copy(compile_options& options, bool value) { options.offload_co ...@@ -134,6 +134,11 @@ void set_offload_copy(compile_options& options, bool value) { options.offload_co
void set_fast_math(compile_options& options, bool value) { options.fast_math = value; } void set_fast_math(compile_options& options, bool value) { options.fast_math = value; }
void set_exhaustive_tune_flag(compile_options& options, bool value)
{
options.exhaustive_tune = value;
}
void set_file_format(file_options& options, const char* format) { options.format = format; } void set_file_format(file_options& options, const char* format) { options.format = format; }
void set_default_dim_value(onnx_options& options, size_t value) void set_default_dim_value(onnx_options& options, size_t value)
...@@ -1690,6 +1695,19 @@ migraphx_compile_options_set_fast_math(migraphx_compile_options_t compile_option ...@@ -1690,6 +1695,19 @@ migraphx_compile_options_set_fast_math(migraphx_compile_options_t compile_option
return api_error_result; return api_error_result;
} }
extern "C" migraphx_status
migraphx_compile_options_set_exhaustive_tune_flag(migraphx_compile_options_t compile_options,
bool value)
{
auto api_error_result = migraphx::try_([&] {
if(compile_options == nullptr)
MIGRAPHX_THROW(migraphx_status_bad_param,
"Bad parameter compile_options: Null pointer");
migraphx::set_exhaustive_tune_flag((compile_options->object), (value));
});
return api_error_result;
}
extern "C" migraphx_status extern "C" migraphx_status
migraphx_parse_onnx(migraphx_program_t* out, const char* name, migraphx_onnx_options_t options) migraphx_parse_onnx(migraphx_program_t* out, const char* name, migraphx_onnx_options_t options)
{ {
......
...@@ -427,6 +427,10 @@ migraphx_compile_options_set_offload_copy(migraphx_compile_options_t compile_opt ...@@ -427,6 +427,10 @@ migraphx_compile_options_set_offload_copy(migraphx_compile_options_t compile_opt
migraphx_status migraphx_compile_options_set_fast_math(migraphx_compile_options_t compile_options, migraphx_status migraphx_compile_options_set_fast_math(migraphx_compile_options_t compile_options,
bool value); bool value);
migraphx_status
migraphx_compile_options_set_exhaustive_tune_flag(migraphx_compile_options_t compile_options,
bool value);
migraphx_status migraphx_status
migraphx_parse_onnx(migraphx_program_t* out, const char* name, migraphx_onnx_options_t options); migraphx_parse_onnx(migraphx_program_t* out, const char* name, migraphx_onnx_options_t options);
......
...@@ -1015,6 +1015,12 @@ struct compile_options : MIGRAPHX_HANDLE_BASE(compile_options) ...@@ -1015,6 +1015,12 @@ struct compile_options : MIGRAPHX_HANDLE_BASE(compile_options)
{ {
call(&migraphx_compile_options_set_fast_math, this->get_handle_ptr(), value); call(&migraphx_compile_options_set_fast_math, this->get_handle_ptr(), value);
} }
/// Set or un-set exhaustive search to find fastest kernel
void set_exhaustive_tune_flag(bool value = true)
{
call(&migraphx_compile_options_set_exhaustive_tune_flag, this->get_handle_ptr(), value);
}
}; };
/// A program represents the all computation graphs to be compiled and executed /// A program represents the all computation graphs to be compiled and executed
......
...@@ -354,6 +354,9 @@ def compile_options(h): ...@@ -354,6 +354,9 @@ def compile_options(h):
h.method('set_fast_math', h.method('set_fast_math',
api.params(value='bool'), api.params(value='bool'),
invoke='migraphx::set_fast_math($@)') invoke='migraphx::set_fast_math($@)')
h.method('set_exhaustive_tune_flag',
api.params(value='bool'),
invoke='migraphx::set_exhaustive_tune_flag($@)')
api.add_function('migraphx_parse_onnx', api.add_function('migraphx_parse_onnx',
......
...@@ -326,8 +326,7 @@ struct compiler ...@@ -326,8 +326,7 @@ struct compiler
loader l; loader l;
program_params parameters; program_params parameters;
compiler_target ct; compiler_target ct;
bool offload_copy = false; compile_options co;
bool fast_math = true;
precision quantize = precision::fp32; precision quantize = precision::fp32;
std::vector<std::string> fill0; std::vector<std::string> fill0;
...@@ -337,19 +336,26 @@ struct compiler ...@@ -337,19 +336,26 @@ struct compiler
l.parse(ap); l.parse(ap);
parameters.parse(ap); parameters.parse(ap);
ct.parse(ap); ct.parse(ap);
ap(offload_copy, ap(co.offload_copy,
{"--enable-offload-copy"}, {"--enable-offload-copy"},
ap.help("Enable implicit offload copying"), ap.help("Enable implicit offload copying"),
ap.set_value(true)); ap.set_value(true));
ap(fast_math, ap(co.fast_math,
{"--disable-fast-math"}, {"--disable-fast-math"},
ap.help("Disable fast math optimization"), ap.help("Disable fast math optimization"),
ap.set_value(false)); ap.set_value(false));
ap(co.exhaustive_tune,
{"--exhaustive-tune"},
ap.help("Exhastively search for best tuning parameters for kernels"),
ap.set_value(true));
ap(quantize, {"--fp16"}, ap.help("Quantize for fp16"), ap.set_value(precision::fp16)); ap(quantize, {"--fp16"}, ap.help("Quantize for fp16"), ap.set_value(precision::fp16));
ap(quantize, {"--int8"}, ap.help("Quantize for int8"), ap.set_value(precision::int8)); ap(quantize, {"--int8"}, ap.help("Quantize for int8"), ap.set_value(precision::int8));
} }
auto params(const program& p) { return parameters.generate(p, ct.get_target(), offload_copy); } auto params(const program& p)
{
return parameters.generate(p, ct.get_target(), co.offload_copy);
}
program compile() program compile()
{ {
...@@ -366,10 +372,7 @@ struct compiler ...@@ -366,10 +372,7 @@ struct compiler
{ {
quantize_int8(p, t, {params(p)}); quantize_int8(p, t, {params(p)});
} }
compile_options options; p.compile(t, co);
options.offload_copy = offload_copy;
options.fast_math = fast_math;
p.compile(t, options);
l.save(p); l.save(p);
return p; return p;
} }
...@@ -402,60 +405,41 @@ struct params : command<params> ...@@ -402,60 +405,41 @@ struct params : command<params>
struct verify : command<verify> struct verify : command<verify>
{ {
loader l; compiler c;
program_params parameters;
compiler_target ct;
double tolerance = 80; double tolerance = 80;
bool per_instruction = false; bool per_instruction = false;
bool reduce = false; bool reduce = false;
bool offload_copy = false;
bool fast_math = true;
precision quantize = precision::fp32;
void parse(argument_parser& ap) void parse(argument_parser& ap)
{ {
l.parse(ap); c.parse(ap);
parameters.parse(ap);
ct.parse(ap);
ap(offload_copy,
{"--enable-offload-copy"},
ap.help("Enable implicit offload copying"),
ap.set_value(true));
ap(fast_math,
{"--disable-fast-math"},
ap.help("Disable fast math optimization"),
ap.set_value(false));
ap(tolerance, {"--tolerance"}, ap.help("Tolerance for errors")); ap(tolerance, {"--tolerance"}, ap.help("Tolerance for errors"));
ap(per_instruction, ap(per_instruction,
{"-i", "--per-instruction"}, {"-i", "--per-instruction"},
ap.help("Verify each instruction"), ap.help("Verify each instruction"),
ap.set_value(true)); ap.set_value(true));
ap(reduce, {"-r", "--reduce"}, ap.help("Reduce program and verify"), ap.set_value(true)); ap(reduce, {"-r", "--reduce"}, ap.help("Reduce program and verify"), ap.set_value(true));
ap(quantize, {"--fp16"}, ap.help("Quantize for fp16"), ap.set_value(precision::fp16));
} }
void run() void run()
{ {
auto p = l.load(); auto p = c.l.load();
l.save(p); c.l.save(p);
std::cout << p << std::endl; std::cout << p << std::endl;
compile_options options; auto t = c.ct.get_target();
options.offload_copy = offload_copy; auto m = c.parameters.generate(p, t, true);
options.fast_math = fast_math;
auto t = ct.get_target();
auto m = parameters.generate(p, t, true);
if(per_instruction) if(per_instruction)
{ {
verify_instructions(p, t, options, quantize, tolerance); verify_instructions(p, t, c.co, c.quantize, tolerance);
} }
else if(reduce) else if(reduce)
{ {
verify_reduced_program(p, t, options, quantize, m, tolerance); verify_reduced_program(p, t, c.co, c.quantize, m, tolerance);
} }
else else
{ {
verify_program(l.file, p, t, options, quantize, m, tolerance); verify_program(c.l.file, p, t, c.co, c.quantize, m, tolerance);
} }
} }
}; };
......
...@@ -108,8 +108,6 @@ target get_target(bool gpu) ...@@ -108,8 +108,6 @@ target get_target(bool gpu)
return make_target("cpu"); return make_target("cpu");
} }
void compile_program(program& p, bool gpu) { p.compile(get_target(gpu)); }
} // namespace MIGRAPHX_INLINE_NS } // namespace MIGRAPHX_INLINE_NS
} // namespace driver } // namespace driver
} // namespace migraphx } // namespace migraphx
...@@ -37,7 +37,6 @@ parameter_map create_param_map(const program& p, const target& t, bool offload = ...@@ -37,7 +37,6 @@ parameter_map create_param_map(const program& p, const target& t, bool offload =
parameter_map fill_param_map(parameter_map& m, const program& p, bool gpu); parameter_map fill_param_map(parameter_map& m, const program& p, bool gpu);
parameter_map create_param_map(const program& p, bool gpu = true); parameter_map create_param_map(const program& p, bool gpu = true);
target get_target(bool gpu); target get_target(bool gpu);
void compile_program(program& p, bool gpu = true);
} // namespace MIGRAPHX_INLINE_NS } // namespace MIGRAPHX_INLINE_NS
} // namespace driver } // namespace driver
......
...@@ -32,8 +32,9 @@ inline namespace MIGRAPHX_INLINE_NS { ...@@ -32,8 +32,9 @@ inline namespace MIGRAPHX_INLINE_NS {
struct compile_options struct compile_options
{ {
bool offload_copy = false; bool offload_copy = false;
bool fast_math = true; bool fast_math = true;
bool exhaustive_tune = false;
tracer trace{}; tracer trace{};
}; };
......
...@@ -66,6 +66,7 @@ any_ptr get_queue_context(T&) ...@@ -66,6 +66,7 @@ any_ptr get_queue_context(T&)
{ {
return {}; return {};
} }
template <class T> template <class T>
void wait_for_context(T&, any_ptr) void wait_for_context(T&, any_ptr)
{ {
...@@ -302,7 +303,7 @@ struct context ...@@ -302,7 +303,7 @@ struct context
PrivateDetailTypeErasedT value, PrivateDetailTypeErasedT value,
typename std::enable_if<not std::is_reference<PrivateDetailTypeErasedU>::value, typename std::enable_if<not std::is_reference<PrivateDetailTypeErasedU>::value,
int>::type* = nullptr) noexcept int>::type* = nullptr) noexcept
: private_detail_te_value(value) : private_detail_te_value(std::move(value))
{ {
} }
...@@ -412,6 +413,7 @@ inline const ValueType& any_cast(const context& x) ...@@ -412,6 +413,7 @@ inline const ValueType& any_cast(const context& x)
#endif #endif
inline void migraphx_to_value(value& v, const context& ctx) { v = ctx.to_value(); } inline void migraphx_to_value(value& v, const context& ctx) { v = ctx.to_value(); }
inline void migraphx_from_value(const value& v, context& ctx) { ctx.from_value(v); } inline void migraphx_from_value(const value& v, context& ctx) { ctx.from_value(v); }
#endif #endif
......
...@@ -210,17 +210,15 @@ void program::compile(const target& t, compile_options options) ...@@ -210,17 +210,15 @@ void program::compile(const target& t, compile_options options)
assert(not this->is_compiled()); assert(not this->is_compiled());
this->impl->target_name = t.name(); this->impl->target_name = t.name();
this->impl->ctx = t.get_context(); this->impl->ctx = t.get_context();
if(enabled(MIGRAPHX_TRACE_COMPILE{})) if(enabled(MIGRAPHX_TRACE_COMPILE{}))
options.trace = tracer{std::cout}; options.trace = tracer{std::cout};
options.trace(*this); options.trace(*this);
options.trace(); options.trace();
auto&& passes = t.get_passes(this->impl->ctx, options); auto&& passes = t.get_passes(this->impl->ctx, options);
run_passes(*this, passes, options.trace); run_passes(*this, passes, options.trace);
auto mods = this->get_modules(); auto mods = this->get_modules();
// Validate and finalize // Validate and finalize
for(const auto& mod : reverse(mods)) for(const auto& mod : reverse(mods))
{ {
......
...@@ -329,15 +329,21 @@ MIGRAPHX_PYBIND11_MODULE(migraphx, m) ...@@ -329,15 +329,21 @@ MIGRAPHX_PYBIND11_MODULE(migraphx, m)
.def("is_compiled", &migraphx::program::is_compiled) .def("is_compiled", &migraphx::program::is_compiled)
.def( .def(
"compile", "compile",
[](migraphx::program& p, const migraphx::target& t, bool offload_copy, bool fast_math) { [](migraphx::program& p,
const migraphx::target& t,
bool offload_copy,
bool fast_math,
bool exhaustive_tune) {
migraphx::compile_options options; migraphx::compile_options options;
options.offload_copy = offload_copy; options.offload_copy = offload_copy;
options.fast_math = fast_math; options.fast_math = fast_math;
options.exhaustive_tune = exhaustive_tune;
p.compile(t, options); p.compile(t, options);
}, },
py::arg("t"), py::arg("t"),
py::arg("offload_copy") = true, py::arg("offload_copy") = true,
py::arg("fast_math") = true) py::arg("fast_math") = true,
py::arg("exhaustive_tune") = false)
.def("get_main_module", [](const migraphx::program& p) { return p.get_main_module(); }) .def("get_main_module", [](const migraphx::program& p) { return p.get_main_module(); })
.def( .def(
"create_module", "create_module",
......
...@@ -216,6 +216,10 @@ struct context ...@@ -216,6 +216,10 @@ struct context
return *current_device; return *current_device;
} }
bool get_exhaustive_tune_flag() const { return exhaustive_tune; }
void set_exhaustive_tune_flag(bool t) { exhaustive_tune = t; }
hip_device::stream& get_stream() { return get_current_device().get_stream(); } hip_device::stream& get_stream() { return get_current_device().get_stream(); }
hip_device::stream& get_stream(std::size_t n) { return get_current_device().get_stream(n); } hip_device::stream& get_stream(std::size_t n) { return get_current_device().get_stream(n); }
...@@ -338,7 +342,8 @@ struct context ...@@ -338,7 +342,8 @@ struct context
// TODO: Make this a vector to support multiple devices // TODO: Make this a vector to support multiple devices
std::shared_ptr<hip_device> current_device; std::shared_ptr<hip_device> current_device;
std::vector<shared<hip_event_ptr>> events; std::vector<shared<hip_event_ptr>> events;
bool measure_perf = false; bool exhaustive_tune = false;
bool measure_perf = false;
// for event perf timing // for event perf timing
shared<hip_event_ptr> start_event = nullptr; shared<hip_event_ptr> start_event = nullptr;
shared<hip_event_ptr> stop_event = nullptr; shared<hip_event_ptr> stop_event = nullptr;
......
...@@ -175,8 +175,9 @@ struct miopen_convolution ...@@ -175,8 +175,9 @@ struct miopen_convolution
auto* miopen_stream_handle = ctx.get_stream().get_miopen(); auto* miopen_stream_handle = ctx.get_stream().get_miopen();
solution_ptr = find_solution(miopen_stream_handle, conv_problem.get()); solution_ptr = find_solution(
auto status = miopenGetSolutionWorkspaceSize(solution_ptr.get(), &workspace_size); miopen_stream_handle, conv_problem.get(), ctx.get_exhaustive_tune_flag());
auto status = miopenGetSolutionWorkspaceSize(solution_ptr.get(), &workspace_size);
if(status != miopenStatusSuccess) if(status != miopenStatusSuccess)
MIGRAPHX_THROW("MIOpen" + op.name() + " : failed to get solution's workspace size"); MIGRAPHX_THROW("MIOpen" + op.name() + " : failed to get solution's workspace size");
...@@ -233,7 +234,7 @@ struct miopen_convolution ...@@ -233,7 +234,7 @@ struct miopen_convolution
&perf, &perf,
workspace.implicit(), workspace.implicit(),
workspace_size, workspace_size,
false); ctx.get_exhaustive_tune_flag());
if(status != miopenStatusSuccess) if(status != miopenStatusSuccess)
MIGRAPHX_THROW("MIOpen " + op.name() + " : find convolution failed"); MIGRAPHX_THROW("MIOpen " + op.name() + " : find convolution failed");
algo = perf.fwd_algo; algo = perf.fwd_algo;
......
...@@ -75,12 +75,19 @@ using miopen_find_options = MIGRAPHX_MANAGE_PTR(miopenFindOptions_t, miopenDestr ...@@ -75,12 +75,19 @@ using miopen_find_options = MIGRAPHX_MANAGE_PTR(miopenFindOptions_t, miopenDestr
using miopen_problem = MIGRAPHX_MANAGE_PTR(miopenProblem_t, miopenDestroyProblem); using miopen_problem = MIGRAPHX_MANAGE_PTR(miopenProblem_t, miopenDestroyProblem);
using miopen_solution = MIGRAPHX_MANAGE_PTR(miopenSolution_t, miopenDestroySolution); using miopen_solution = MIGRAPHX_MANAGE_PTR(miopenSolution_t, miopenDestroySolution);
inline miopen_solution find_solution(miopenHandle_t handle, miopenProblem_t problem) inline miopen_solution
find_solution(miopenHandle_t handle, miopenProblem_t problem, bool tune = false)
{ {
miopenSolution_t solution; miopenSolution_t solution;
size_t found = 0; size_t found = 0;
auto status = miopenFindSolutions(handle, problem, nullptr, &solution, &found, 1); miopen_find_options fo = nullptr;
auto result = miopen_solution{solution}; if(tune)
{
fo = make_obj<miopen_find_options>(&miopenCreateFindOptions);
miopenSetFindOptionTuning(fo.get(), 1);
}
auto status = miopenFindSolutions(handle, problem, fo.get(), &solution, &found, 1);
auto result = miopen_solution{solution};
if(status != miopenStatusSuccess or found == 0) if(status != miopenStatusSuccess or found == 0)
MIGRAPHX_THROW("MIOpen miopenFindSolutions failed"); MIGRAPHX_THROW("MIOpen miopenFindSolutions failed");
return result; return result;
......
...@@ -83,8 +83,7 @@ struct miopen_apply ...@@ -83,8 +83,7 @@ struct miopen_apply
auto& ctx = get_context(); auto& ctx = get_context();
int8_x4_format = get_int8_x4_format(ctx); int8_x4_format = get_int8_x4_format(ctx);
compute_fp32 = get_compute_fp32_flag(); compute_fp32 = get_compute_fp32_flag();
offload_copy = (mod->name() == "main") ? pass->offload_copy : false;
offload_copy = (mod->name() == "main") ? pass->offload_copy : false;
add_generic_op("contiguous"); add_generic_op("contiguous");
......
...@@ -91,6 +91,7 @@ pass enable_pass(bool enabled, pass p) ...@@ -91,6 +91,7 @@ pass enable_pass(bool enabled, pass p)
std::vector<pass> target::get_passes(migraphx::context& gctx, const compile_options& options) const std::vector<pass> target::get_passes(migraphx::context& gctx, const compile_options& options) const
{ {
auto& ctx = any_cast<context>(gctx); auto& ctx = any_cast<context>(gctx);
ctx.set_exhaustive_tune_flag(options.exhaustive_tune);
std::set<shape::type_t> unsupported_types(shape::types().begin(), shape::types().end()); std::set<shape::type_t> unsupported_types(shape::types().begin(), shape::types().end());
unsupported_types.erase(shape::type_t::float_type); unsupported_types.erase(shape::type_t::float_type);
unsupported_types.erase(shape::type_t::half_type); unsupported_types.erase(shape::type_t::half_type);
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment