"...composable_kernel_rocm.git" did not exist on "36c7ce4e0eef86df186f8d796d7e177b8b13df92"
- 12 May, 2023 1 commit
-
-
Ted Themistokleous authored
This builds on debian and centos is still running into issues using the newer version of gcc to build.
-
- 10 May, 2023 2 commits
-
-
Ted Themistokleous authored
-
Ted Themistokleous authored
Currently ROCm doesn't have a later release for 5.2.5 for CENTOS but this is something still useful to test on needed to add rocblas-devel and miopen-devel to get properly picked up by cmake Will reuse this work for RHEL and other OS builds once I can confirm make analyze and make checks work with all debug and non debug builds
-
- 17 Apr, 2023 24 commits
-
-
Ted Themistokleous authored
-
Ted Themistokleous authored
Don't reinvent the wheel, just use std::greater<****> since it mirrors the behavior of the previous priority_queue we replaced here.
-
Ted Themistokleous authored
-
Ted Themistokleous authored
These still seem to stall. Commenting out to make sure we can get a proper CI run of this.
-
Ted Themistokleous authored
This wasn't sorting correctly without this and failing other tests.
-
Ted Themistokleous authored
-
Ted Themistokleous authored
Remove's caching from this and may lead to errors down the road
-
Ted Themistokleous authored
Already installed via install_prereqs.sh for libtbb-dev
-
Ted Themistokleous authored
Allows us to continually filter out the top value as a pop when performing the copy_if just an index after.
-
Ted Themistokleous authored
Offload these calculations when the batch box is created since we're now copying by value, no need to recalculate these parameters. Reduces the work repeated for the top_box selected but still leverages parallelism for each subsequent box compared as our lambda in copy_if calls batch_box() prior to suppress_by_iou
-
Ted Themistokleous authored
Make copies here since we're doing this calc in parallel
-
Ted Themistokleous authored
Less code, simple to read.
-
Ted Themistokleous authored
need to support std::execution::par used for parallel computation support.
-
Ted Themistokleous authored
This reverts commit aa91c4db7551ad69b6141597483d7c980d40d466.
-
Ted Themistokleous authored
- Add support for TBB in MIGraphX - Add include for TBB in DockerFile - Replace inner loop with copy_if and use std::execution:par to filter - Change heap to vector and sort in parallel in filter_boxes_per_score() With the help of Paul this cuts down NMS in ref from around 43-44s to about 2s
-
Ted Themistokleous authored
This cleans up the compute_nms signature as well as stops using additional memory by not storing every pair result twice that just gets cleared per run each shape_for_each()
-
Ted Themistokleous authored
Allows us to transform to get the proper input then spawn a thread to call f() in a threaded fashion. Useful if we have many batches/classes for our runs.
-
Ted Themistokleous authored
This avoids us performing N comparisons for the given batch if the score threshold used is less than zero. This allows us to simply just std::transform all boxes without needing to perform a bunch of needles compares and use constructs a std::pair of box score, idx directly.
-
Ted Themistokleous authored
Remove the need to use gpu, switch this to ref. change names to reflect static vs random data
-
Ted Themistokleous authored
In this case we have a batch size with no bound on the score threshold. We end up evaluating a single huge batch on its own. The concern here is this should just all the way through without completely stalling or intractably running in a single thread fashion currently.
-
Ted Themistokleous authored
This saves us two copies of the entire box class to this call and instead works on reference of these objects that are created within the loops instead
-
Ted Themistokleous authored
We're continually creating/destroying batch box in the while() check as we run through the boxes_heap() by calling batch_box() constantly. Make this next_box and only calculate it before we pop that box from the boxes_heap. should get rid of function overhead of constant calls in the case of a large batch size
-
Ted Themistokleous authored
Just quickly return if either boxes have zero area. Searching for intersection and union is irrelevant here logically.
-
shivadbhavsar authored
Expose the shape::type_t values to be used by the python api and is required by torch_migraphx to support torchbench models.
-
- 13 Apr, 2023 1 commit
-
-
Zhuoran Yin authored
-
- 12 Apr, 2023 3 commits
-
-
Paul Fultz II authored
-
Paul Fultz II authored
This removes the --cxx flags from the rbuild commands since it is not necessary. Also added a section about using rbuild to set up an environment for development.
-
Djordje Petrovic authored
-
- 11 Apr, 2023 3 commits
-
-
github-actions[bot] authored
-
Paul Fultz II authored
-
Ted Themistokleous authored
-
- 10 Apr, 2023 3 commits
-
-
Umang Yadav authored
-
Charlie Lin authored
Adds a matcher to split_single_dyn_dim to find all broadcast or multibroadcast with two static shape inputs and replaces the instruction with the one input version. Sorts the get_output_parameters() list to ensure the correct ordering. (Was getting an error for some models.)
-
Paul Fultz II authored
-
- 09 Apr, 2023 1 commit
-
-
Paul Fultz II authored
* Enable hiprtc by default
-
- 07 Apr, 2023 1 commit
-
-
Paul Fultz II authored
Converts can be inserted when the scales and input differ in the onnx file(we are already doing this implicit conversion in the ref implementation). This will also improve the compile-time of quantizelinear.hpp since we can remove the nested visit method.
-
- 06 Apr, 2023 1 commit
-
-
Charlie Lin authored
Examples.. bin/driver verify /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim @data "[{min:1, max:4}, 3, 224, 224]" bin/driver compile /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --default-dyn-dim "{min:1, max:10}" --output resnet50_batch1-10.mxr bin/driver perf resnet50_batch1-10.mxr --batch 4
-