Unverified Commit 48bef3a7 authored by silentghoul-spec's avatar silentghoul-spec Committed by GitHub
Browse files

Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer (#22302)



Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer. Earlier xpath_sub_list was same as xpath_tags_list
Co-authored-by: default avatardusejat <dusejat@amazon.com>
parent 4e94c6c0
...@@ -301,7 +301,7 @@ class MarkupLMTokenizer(PreTrainedTokenizer): ...@@ -301,7 +301,7 @@ class MarkupLMTokenizer(PreTrainedTokenizer):
xpath_subs_list.append(min(self.max_width, sub)) xpath_subs_list.append(min(self.max_width, sub))
xpath_tags_list = xpath_tags_list[: self.max_depth] xpath_tags_list = xpath_tags_list[: self.max_depth]
xpath_subs_list = xpath_tags_list[: self.max_depth] xpath_subs_list = xpath_subs_list[: self.max_depth]
xpath_tags_list += [self.pad_tag_id] * (self.max_depth - len(xpath_tags_list)) xpath_tags_list += [self.pad_tag_id] * (self.max_depth - len(xpath_tags_list))
xpath_subs_list += [self.pad_width] * (self.max_depth - len(xpath_subs_list)) xpath_subs_list += [self.pad_width] * (self.max_depth - len(xpath_subs_list))
......
...@@ -275,7 +275,7 @@ class MarkupLMTokenizerFast(PreTrainedTokenizerFast): ...@@ -275,7 +275,7 @@ class MarkupLMTokenizerFast(PreTrainedTokenizerFast):
xpath_subs_list.append(min(self.max_width, sub)) xpath_subs_list.append(min(self.max_width, sub))
xpath_tags_list = xpath_tags_list[: self.max_depth] xpath_tags_list = xpath_tags_list[: self.max_depth]
xpath_subs_list = xpath_tags_list[: self.max_depth] xpath_subs_list = xpath_subs_list[: self.max_depth]
xpath_tags_list += [self.pad_tag_id] * (self.max_depth - len(xpath_tags_list)) xpath_tags_list += [self.pad_tag_id] * (self.max_depth - len(xpath_tags_list))
xpath_subs_list += [self.pad_width] * (self.max_depth - len(xpath_subs_list)) xpath_subs_list += [self.pad_width] * (self.max_depth - len(xpath_subs_list))
......
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment