Commits · add_centos_docker · gaoqiong / MIGraphX

12 May, 2023 1 commit

Use oneTBB instead of tbb for centos · c25c7a9b

Ted Themistokleous authored May 12, 2023

This builds on debian and centos is still running into issues using the newer
version of gcc to build.

c25c7a9b

10 May, 2023 2 commits

Work in progress Update find_package for TBB due to CENTOS/RHEL packaging · e49295b3
Ted Themistokleous authored Apr 14, 2023

e49295b3

Add Dockerfile and prereqs to create a container for CentOS 8 · 03bcd72f

Ted Themistokleous authored Apr 13, 2023

Currently ROCm doesn't have a later release for 5.2.5 for CENTOS but this
is something still useful to test on

needed to add rocblas-devel and miopen-devel to get properly picked up by
cmake

Will reuse this work for RHEL and other OS builds once I can confirm make analyze
and make checks work with all debug and non debug builds

03bcd72f

17 Apr, 2023 24 commits

Remove libtbb-dev for package checks · 28b9cab5
Ted Themistokleous authored Apr 04, 2023

28b9cab5

Use std::greater to compaire pair instead of using lambda for boxes heap · ca48aac2

Ted Themistokleous authored Mar 29, 2023

Don't reinvent the wheel, just use std::greater<****> since it mirrors the
behavior of the previous priority_queue we replaced here.

ca48aac2

Add libtbb2 as prereq for cmake and packages alongside libtbb-dev · 6e337a2f
Ted Themistokleous authored Mar 28, 2023

6e337a2f
Cleanup and comment out huge test cases · f4550aab
Ted Themistokleous authored Mar 28, 2023
```
These still seem to stall. Commenting out to make sure we can get a proper
CI run of this.
```
f4550aab
Add compare function to sort() on in filter_boxes_by_score() · ac3e44ec
Ted Themistokleous authored Mar 28, 2023
```
This wasn't sorting correctly without this and failing other tests.
```
ac3e44ec
Use generate for boxes and scores in tests · 14618b67
Ted Themistokleous authored Mar 28, 2023

14618b67
Remove caching area for boxes upon batch_box run · 514d616f
Ted Themistokleous authored Mar 28, 2023
```
Remove's caching from this and may lead to errors down the road
```
514d616f
Remove libtbb-dev install from docker file and update package deps · f908781c
Ted Themistokleous authored Mar 28, 2023
```
Already installed via install_prereqs.sh for libtbb-dev
```
f908781c
Use copy_if to "pop" front of vector instead of using erase() · d52969d7
Ted Themistokleous authored Mar 28, 2023
```
Allows us to continually filter out the top value as a pop when performing the copy_if
just an index after.
```
d52969d7

Move sort and area calculations to batch_box() creation · 450d164a

Ted Themistokleous authored Mar 28, 2023

Offload these calculations when the batch box is created since we're now
copying by value, no need to recalculate these parameters.

Reduces the work repeated for the top_box selected but still leverages parallelism
for each subsequent box compared as our lambda in copy_if calls batch_box()
prior to suppress_by_iou

450d164a

Pass by value to suppress_by_iou() · 347e79c1
Ted Themistokleous authored Mar 28, 2023
```
Make copies here since we're doing this calc in parallel
```
347e79c1
Use std::back_inserter instead of push_back in filter_boxes_by_score() · 8cced061
Ted Themistokleous authored Mar 28, 2023
```
Less code, simple to read.
```
8cced061
Add libtbb-dev to install_prereqs.sh · 7785e68e
Ted Themistokleous authored Mar 28, 2023
```
need to support std::execution::par used for parallel computation support.
```
7785e68e
Revert "Create shape_for_each_threaded() to parallelize f() calls" · bf446165
Ted Themistokleous authored Mar 28, 2023
```
This reverts commit aa91c4db7551ad69b6141597483d7c980d40d466.
```
bf446165

Use copy_if and parallel execution by leveraging TBB. · 5ec4b513

Ted Themistokleous authored Mar 27, 2023

- Add support for TBB in MIGraphX
- Add include for TBB in DockerFile
- Replace inner loop with copy_if and use std::execution:par to filter
- Change heap to vector and sort in parallel in filter_boxes_per_score()

With the help of Paul this cuts down NMS in ref from around 43-44s to about 2s

5ec4b513

Use int64_t to track selected_boxes_inside_class · 422d2c73

Ted Themistokleous authored Mar 27, 2023

This cleans up the compute_nms signature as well as stops using additional
memory by not storing every pair result twice that just gets cleared per run each shape_for_each()

422d2c73

Create shape_for_each_threaded() to parallelize f() calls · 680ae7cc

Ted Themistokleous authored Mar 27, 2023

Allows us to transform to get the proper input then spawn a thread to call
f() in a threaded fashion. Useful if we have many batches/classes for our
runs.

680ae7cc

Add no filter case in filter_boxes_by_score() if score_threshold is zero · 49e6abca

Ted Themistokleous authored Mar 27, 2023

This avoids us performing N comparisons for the given batch if the score
threshold used is less than zero. This allows us to simply just std::transform
all boxes without needing to perform a bunch of needles compares and use
constructs a std::pair of box score, idx directly.

49e6abca

Change up tests to huge of static and random data · 7f705c1d
Ted Themistokleous authored Mar 27, 2023
```
Remove the need to use gpu, switch this to ref.
change names to reflect static vs random data
```
7f705c1d

Huge test cases that capture error state of NMS with single huge batch size · 5f61e85f

Ted Themistokleous authored Mar 27, 2023

In this case we have a batch size with no bound on the score threshold.
We end up evaluating a single huge batch on its own.

The concern here is this should just all the way through without completely
stalling or intractably running in a single thread fashion currently.

5f61e85f

Suppress IOU based on box references instead of copying over boxes · 14b20fdf

Ted Themistokleous authored Mar 27, 2023

This saves us two copies of the entire box class to this call and instead
works on reference of these objects that are created within the loops instead

14b20fdf

Create next_box from next_top_score once during IOU suppression · aa3fba66

Ted Themistokleous authored Mar 27, 2023

We're continually creating/destroying batch box in the while() check as we
run through the boxes_heap() by calling batch_box() constantly.

Make this next_box and only calculate it before we pop that box from the boxes_heap.

should get rid of function overhead of constant calls in the case of a large
batch size

aa3fba66

Add early return for suppress function based on box area · 7b42f05c

Ted Themistokleous authored Mar 27, 2023

Just quickly return if either boxes have zero area. Searching for intersection
and union is irrelevant here logically.

7b42f05c

expose enum datatypes to python api (#1655) · 42685803

shivadbhavsar authored Apr 17, 2023

Expose the shape::type_t values to be used by the python api and is required by torch_migraphx to support torchbench models.

42685803

13 Apr, 2023 1 commit
- [mlir] Adding quantizelinear, dequantizelinear and quant_convolution support (#1675) · 7b2a5ccf
  Zhuoran Yin authored Apr 13, 2023
  
  7b2a5ccf
12 Apr, 2023 3 commits
- Print out pass name when tracing passes (#1667) · 551b927c
  Paul Fultz II authored Apr 12, 2023
  
  551b927c
- Updates to README (#1671) · ec4b79c2
  Paul Fultz II authored Apr 12, 2023
```
This removes the --cxx flags from the rbuild commands since it is not necessary. Also added a section about using rbuild to set up an environment for development.
```
  ec4b79c2
- Update workflow to support rocm image overwrite (#1662) · 851f8f3e
  Djordje Petrovic authored Apr 12, 2023
  
  851f8f3e
11 Apr, 2023 3 commits
- Onnxruntime Weekly Sync 2023-04-07 (#1676) · cc8dda73
  github-actions[bot] authored Apr 11, 2023
  
  cc8dda73
- Enable tidy on gpu driver (#1659) · 3385dcc8
  Paul Fultz II authored Apr 11, 2023
  
  3385dcc8
- Update name of github action script (#1624) · 744c6ab7
  Ted Themistokleous authored Apr 11, 2023
  
  744c6ab7
10 Apr, 2023 3 commits
- Always build ref target when building MIGraphX (#1636) · cce35871
  Umang Yadav authored Apr 10, 2023
  
  cce35871
- Fix 2 input broadcast bug for dynamic batch and output parameter ordering (#1669) · d3eb5609
  Charlie Lin authored Apr 10, 2023
```
Adds a matcher to split_single_dyn_dim to find all broadcast or multibroadcast with two static shape inputs and replaces the instruction with the one input version.
Sorts the get_output_parameters() list to ensure the correct ordering. (Was getting an error for some models.)
```
  d3eb5609
- Add dockerignore file (#1661) · 2e754cdd
  Paul Fultz II authored Apr 10, 2023
  
  2e754cdd
09 Apr, 2023 1 commit
- Enable hiprtc by default (#1658) · db6c75e7
  Paul Fultz II authored Apr 09, 2023
```
* Enable hiprtc by default
```
  db6c75e7
07 Apr, 2023 1 commit

Require the same type for the inputs and scales for QuantizeLinear (#1642) · f6e22d56

Paul Fultz II authored Apr 06, 2023

Converts can be inserted when the scales and input differ in the onnx file(we are already doing this implicit conversion in the ref implementation). This will also improve the compile-time of quantizelinear.hpp since we can remove the nested visit method.

f6e22d56

06 Apr, 2023 1 commit

Driver dynamic batch update (#1652) · adccec52

Charlie Lin authored Apr 06, 2023

Examples..

bin/driver verify /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim @data "[{min:1, max:4}, 3, 224, 224]"

bin/driver compile /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --default-dyn-dim "{min:1, max:10}" --output resnet50_batch1-10.mxr

bin/driver perf resnet50_batch1-10.mxr --batch 4

adccec52