Unverified Commit 89bc3079 authored by Nicolas Hug's avatar Nicolas Hug Committed by GitHub
Browse files

Unify parameters formatting in docstrings (#3268)

parent e04de77c
...@@ -238,8 +238,10 @@ maskUtils = mask_util ...@@ -238,8 +238,10 @@ maskUtils = mask_util
def loadRes(self, resFile): def loadRes(self, resFile):
""" """
Load result file and return a result api object. Load result file and return a result api object.
:param resFile (str) : file name of result file Args:
:return: res (obj) : result api object resFile (str): file name of result file
Returns:
res (obj): result api object
""" """
res = COCO() res = COCO()
res.dataset['images'] = [img for img in self.dataset['images']] res.dataset['images'] = [img for img in self.dataset['images']]
......
...@@ -181,17 +181,14 @@ def _read_video_from_file( ...@@ -181,17 +181,14 @@ def _read_video_from_file(
Reads a video from a file, returning both the video frames as well as Reads a video from a file, returning both the video frames as well as
the audio frames the audio frames
Args Args:
---------- filename (str): path to the video file
filename : str seek_frame_margin (double, optional): seeking frame in the stream is imprecise. Thus,
path to the video file when video_start_pts is specified, we seek the pts earlier by seek_frame_margin seconds
seek_frame_margin: double, optional read_video_stream (int, optional): whether read video stream. If yes, set to 1. Otherwise, 0
seeking frame in the stream is imprecise. Thus, when video_start_pts video_width/video_height/video_min_dimension/video_max_dimension (int): together decide
is specified, we seek the pts earlier by seek_frame_margin seconds the size of decoded frames:
read_video_stream: int, optional
whether read video stream. If yes, set to 1. Otherwise, 0
video_width/video_height/video_min_dimension/video_max_dimension: int
together decide the size of decoded frames
- When video_width = 0, video_height = 0, video_min_dimension = 0, - When video_width = 0, video_height = 0, video_min_dimension = 0,
and video_max_dimension = 0, keep the original frame resolution and video_max_dimension = 0, keep the original frame resolution
- When video_width = 0, video_height = 0, video_min_dimension != 0, - When video_width = 0, video_height = 0, video_min_dimension != 0,
...@@ -214,30 +211,19 @@ def _read_video_from_file( ...@@ -214,30 +211,19 @@ def _read_video_from_file(
and video_max_dimension = 0, resize the frame so that frame and video_max_dimension = 0, resize the frame so that frame
video_width and video_height are set to $video_width and video_width and video_height are set to $video_width and
$video_height, respectively $video_height, respectively
video_pts_range : list(int), optional video_pts_range (list(int), optional): the start and end presentation timestamp of video stream
the start and end presentation timestamp of video stream video_timebase (Fraction, optional): a Fraction rational number which denotes timebase in video stream
video_timebase: Fraction, optional read_audio_stream (int, optional): whether read audio stream. If yes, set to 1. Otherwise, 0
a Fraction rational number which denotes timebase in video stream audio_samples (int, optional): audio sampling rate
read_audio_stream: int, optional audio_channels (int optional): audio channels
whether read audio stream. If yes, set to 1. Otherwise, 0 audio_pts_range (list(int), optional): the start and end presentation timestamp of audio stream
audio_samples: int, optional audio_timebase (Fraction, optional): a Fraction rational number which denotes time base in audio stream
audio sampling rate
audio_channels: int optional
audio channels
audio_pts_range : list(int), optional
the start and end presentation timestamp of audio stream
audio_timebase: Fraction, optional
a Fraction rational number which denotes time base in audio stream
Returns Returns
------- vframes (Tensor[T, H, W, C]): the `T` video frames
vframes : Tensor[T, H, W, C] aframes (Tensor[L, K]): the audio frames, where `L` is the number of points and
the `T` video frames
aframes : Tensor[L, K]
the audio frames, where `L` is the number of points and
`K` is the number of audio_channels `K` is the number of audio_channels
info : Dict info (Dict): metadata for the video and audio. Can contain the fields video_fps (float)
metadata for the video and audio. Can contain the fields video_fps (float)
and audio_fps (int) and audio_fps (int)
""" """
_validate_pts(video_pts_range) _validate_pts(video_pts_range)
...@@ -345,17 +331,15 @@ def _read_video_from_memory( ...@@ -345,17 +331,15 @@ def _read_video_from_memory(
the audio frames the audio frames
This function is torchscriptable. This function is torchscriptable.
Args Args:
---------- video_data (data type could be 1) torch.Tensor, dtype=torch.int8 or 2) python bytes):
video_data : data type could be 1) torch.Tensor, dtype=torch.int8 or 2) python bytes
compressed video content stored in either 1) torch.Tensor 2) python bytes compressed video content stored in either 1) torch.Tensor 2) python bytes
seek_frame_margin: double, optional seek_frame_margin (double, optional): seeking frame in the stream is imprecise.
seeking frame in the stream is imprecise. Thus, when video_start_pts is specified, Thus, when video_start_pts is specified, we seek the pts earlier by seek_frame_margin seconds
we seek the pts earlier by seek_frame_margin seconds read_video_stream (int, optional): whether read video stream. If yes, set to 1. Otherwise, 0
read_video_stream: int, optional video_width/video_height/video_min_dimension/video_max_dimension (int): together decide
whether read video stream. If yes, set to 1. Otherwise, 0 the size of decoded frames:
video_width/video_height/video_min_dimension/video_max_dimension: int
together decide the size of decoded frames
- When video_width = 0, video_height = 0, video_min_dimension = 0, - When video_width = 0, video_height = 0, video_min_dimension = 0,
and video_max_dimension = 0, keep the original frame resolution and video_max_dimension = 0, keep the original frame resolution
- When video_width = 0, video_height = 0, video_min_dimension != 0, - When video_width = 0, video_height = 0, video_min_dimension != 0,
...@@ -378,27 +362,19 @@ def _read_video_from_memory( ...@@ -378,27 +362,19 @@ def _read_video_from_memory(
and video_max_dimension = 0, resize the frame so that frame and video_max_dimension = 0, resize the frame so that frame
video_width and video_height are set to $video_width and video_width and video_height are set to $video_width and
$video_height, respectively $video_height, respectively
video_pts_range : list(int), optional video_pts_range (list(int), optional): the start and end presentation timestamp of video stream
the start and end presentation timestamp of video stream video_timebase_numerator / video_timebase_denominator (float, optional): a rational
video_timebase_numerator / video_timebase_denominator: optional number which denotes timebase in video stream
a rational number which denotes timebase in video stream read_audio_stream (int, optional): whether read audio stream. If yes, set to 1. Otherwise, 0
read_audio_stream: int, optional audio_samples (int, optional): audio sampling rate
whether read audio stream. If yes, set to 1. Otherwise, 0 audio_channels (int optional): audio audio_channels
audio_samples: int, optional audio_pts_range (list(int), optional): the start and end presentation timestamp of audio stream
audio sampling rate audio_timebase_numerator / audio_timebase_denominator (float, optional):
audio_channels: int optional
audio audio_channels
audio_pts_range : list(int), optional
the start and end presentation timestamp of audio stream
audio_timebase_numerator / audio_timebase_denominator: optional
a rational number which denotes time base in audio stream a rational number which denotes time base in audio stream
Returns Returns:
------- vframes (Tensor[T, H, W, C]): the `T` video frames
vframes : Tensor[T, H, W, C] aframes (Tensor[L, K]): the audio frames, where `L` is the number of points and
the `T` video frames
aframes : Tensor[L, K]
the audio frames, where `L` is the number of points and
`K` is the number of channels `K` is the number of channels
""" """
......
...@@ -119,18 +119,14 @@ def encode_png(input: torch.Tensor, compression_level: int = 6) -> torch.Tensor: ...@@ -119,18 +119,14 @@ def encode_png(input: torch.Tensor, compression_level: int = 6) -> torch.Tensor:
Takes an input tensor in CHW layout and returns a buffer with the contents Takes an input tensor in CHW layout and returns a buffer with the contents
of its corresponding PNG file. of its corresponding PNG file.
Parameters Args:
---------- input (Tensor[channels, image_height, image_width]): int8 image tensor of
input: Tensor[channels, image_height, image_width] `c` channels, where `c` must 3 or 1.
int8 image tensor of `c` channels, where `c` must 3 or 1. compression_level (int): Compression factor for the resulting file, it must be a number
compression_level: int
Compression factor for the resulting file, it must be a number
between 0 and 9. Default: 6 between 0 and 9. Default: 6
Returns Returns:
------- output (Tensor[1]): A one dimensional int8 tensor that contains the raw bytes of the
output: Tensor[1]
A one dimensional int8 tensor that contains the raw bytes of the
PNG file. PNG file.
""" """
output = torch.ops.image.encode_png(input, compression_level) output = torch.ops.image.encode_png(input, compression_level)
...@@ -142,14 +138,11 @@ def write_png(input: torch.Tensor, filename: str, compression_level: int = 6): ...@@ -142,14 +138,11 @@ def write_png(input: torch.Tensor, filename: str, compression_level: int = 6):
Takes an input tensor in CHW layout (or HW in the case of grayscale images) Takes an input tensor in CHW layout (or HW in the case of grayscale images)
and saves it in a PNG file. and saves it in a PNG file.
Parameters Args:
---------- input (Tensor[channels, image_height, image_width]): int8 image tensor of
input: Tensor[channels, image_height, image_width] `c` channels, where `c` must be 1 or 3.
int8 image tensor of `c` channels, where `c` must be 1 or 3. filename (str): Path to save the image.
filename: str compression_level (int): Compression factor for the resulting file, it must be a number
Path to save the image.
compression_level: int
Compression factor for the resulting file, it must be a number
between 0 and 9. Default: 6 between 0 and 9. Default: 6
""" """
output = encode_png(input, compression_level) output = encode_png(input, compression_level)
...@@ -182,18 +175,14 @@ def encode_jpeg(input: torch.Tensor, quality: int = 75) -> torch.Tensor: ...@@ -182,18 +175,14 @@ def encode_jpeg(input: torch.Tensor, quality: int = 75) -> torch.Tensor:
Takes an input tensor in CHW layout and returns a buffer with the contents Takes an input tensor in CHW layout and returns a buffer with the contents
of its corresponding JPEG file. of its corresponding JPEG file.
Parameters Args:
---------- input (Tensor[channels, image_height, image_width])): int8 image tensor of
input: Tensor[channels, image_height, image_width]) `c` channels, where `c` must be 1 or 3.
int8 image tensor of `c` channels, where `c` must be 1 or 3. quality (int): Quality of the resulting JPEG file, it must be a number between
quality: int
Quality of the resulting JPEG file, it must be a number between
1 and 100. Default: 75 1 and 100. Default: 75
Returns Returns:
------- output (Tensor[1]): A one dimensional int8 tensor that contains the raw bytes of the
output: Tensor[1]
A one dimensional int8 tensor that contains the raw bytes of the
JPEG file. JPEG file.
""" """
if quality < 1 or quality > 100: if quality < 1 or quality > 100:
...@@ -208,14 +197,11 @@ def write_jpeg(input: torch.Tensor, filename: str, quality: int = 75): ...@@ -208,14 +197,11 @@ def write_jpeg(input: torch.Tensor, filename: str, quality: int = 75):
""" """
Takes an input tensor in CHW layout and saves it in a JPEG file. Takes an input tensor in CHW layout and saves it in a JPEG file.
Parameters Args:
---------- input (Tensor[channels, image_height, image_width]): int8 image tensor of `c`
input: Tensor[channels, image_height, image_width] channels, where `c` must be 1 or 3.
int8 image tensor of `c` channels, where `c` must be 1 or 3. filename (str): Path to save the image.
filename: str quality (int): Quality of the resulting JPEG file, it must be a number
Path to save the image.
quality: int
Quality of the resulting JPEG file, it must be a number
between 1 and 100. Default: 75 between 1 and 100. Default: 75
""" """
output = encode_jpeg(input, quality) output = encode_jpeg(input, quality)
...@@ -230,20 +216,16 @@ def decode_image(input: torch.Tensor, mode: ImageReadMode = ImageReadMode.UNCHAN ...@@ -230,20 +216,16 @@ def decode_image(input: torch.Tensor, mode: ImageReadMode = ImageReadMode.UNCHAN
Optionally converts the image to the desired format. Optionally converts the image to the desired format.
The values of the output tensor are uint8 between 0 and 255. The values of the output tensor are uint8 between 0 and 255.
Parameters Args:
---------- input (Tensor): a one dimensional uint8 tensor containing the raw bytes of the
input: Tensor
a one dimensional uint8 tensor containing the raw bytes of the
PNG or JPEG image. PNG or JPEG image.
mode: ImageReadMode mode (ImageReadMode): the read mode used for optionally converting the image.
the read mode used for optionally converting the image.
Default: `ImageReadMode.UNCHANGED`. Default: `ImageReadMode.UNCHANGED`.
See `ImageReadMode` class for more information on various See `ImageReadMode` class for more information on various
available modes. available modes.
Returns Returns:
------- output (Tensor[image_channels, image_height, image_width])
output: Tensor[image_channels, image_height, image_width]
""" """
output = torch.ops.image.decode_image(input, mode.value) output = torch.ops.image.decode_image(input, mode.value)
return output return output
...@@ -255,19 +237,15 @@ def read_image(path: str, mode: ImageReadMode = ImageReadMode.UNCHANGED) -> torc ...@@ -255,19 +237,15 @@ def read_image(path: str, mode: ImageReadMode = ImageReadMode.UNCHANGED) -> torc
Optionally converts the image to the desired format. Optionally converts the image to the desired format.
The values of the output tensor are uint8 between 0 and 255. The values of the output tensor are uint8 between 0 and 255.
Parameters Args:
---------- path (str): path of the JPEG or PNG image.
path: str mode (ImageReadMode): the read mode used for optionally converting the image.
path of the JPEG or PNG image.
mode: ImageReadMode
the read mode used for optionally converting the image.
Default: `ImageReadMode.UNCHANGED`. Default: `ImageReadMode.UNCHANGED`.
See `ImageReadMode` class for more information on various See `ImageReadMode` class for more information on various
available modes. available modes.
Returns Returns:
------- output (Tensor[image_channels, image_height, image_width])
output: Tensor[image_channels, image_height, image_width]
""" """
data = read_file(path) data = read_file(path)
return decode_image(data, mode) return decode_image(data, mode)
...@@ -63,27 +63,18 @@ def write_video( ...@@ -63,27 +63,18 @@ def write_video(
""" """
Writes a 4d tensor in [T, H, W, C] format in a video file Writes a 4d tensor in [T, H, W, C] format in a video file
Parameters Args:
---------- filename (str): path where the video will be saved
filename : str video_array (Tensor[T, H, W, C]): tensor containing the individual frames,
path where the video will be saved as a uint8 tensor in [T, H, W, C] format
video_array : Tensor[T, H, W, C] fps (Number): video frames per second
tensor containing the individual frames, as a uint8 tensor in [T, H, W, C] format video_codec (str): the name of the video codec, i.e. "libx264", "h264", etc.
fps : Number options (Dict): dictionary containing options to be passed into the PyAV video stream
video frames per second audio_array (Tensor[C, N]): tensor containing the audio, where C is the number of channels
video_codec : str and N is the number of samples
the name of the video codec, i.e. "libx264", "h264", etc. audio_fps (Number): audio sample rate, typically 44100 or 48000
options : Dict audio_codec (str): the name of the audio codec, i.e. "mp3", "aac", etc.
dictionary containing options to be passed into the PyAV video stream audio_options (Dict): dictionary containing options to be passed into the PyAV audio stream
audio_array : Tensor[C, N]
tensor containing the audio, where C is the number of channels and N is the
number of samples
audio_fps : Number
audio sample rate, typically 44100 or 48000
audio_codec : str
the name of the audio codec, i.e. "mp3", "aac", etc.
audio_options : Dict
dictionary containing options to be passed into the PyAV audio stream
""" """
_check_av_available() _check_av_available()
video_array = torch.as_tensor(video_array, dtype=torch.uint8).numpy() video_array = torch.as_tensor(video_array, dtype=torch.uint8).numpy()
...@@ -251,28 +242,20 @@ def read_video( ...@@ -251,28 +242,20 @@ def read_video(
Reads a video from a file, returning both the video frames as well as Reads a video from a file, returning both the video frames as well as
the audio frames the audio frames
Parameters Args:
---------- filename (str): path to the video file
filename : str start_pts (int if pts_unit = 'pts', float / Fraction if pts_unit = 'sec', optional):
path to the video file The start presentation time of the video
start_pts : int if pts_unit = 'pts', optional end_pts (int if pts_unit = 'pts', float / Fraction if pts_unit = 'sec', optional):
float / Fraction if pts_unit = 'sec', optional The end presentation time
the start presentation time of the video pts_unit (str, optional): unit in which start_pts and end_pts values will be interpreted,
end_pts : int if pts_unit = 'pts', optional either 'pts' or 'sec'. Defaults to 'pts'.
float / Fraction if pts_unit = 'sec', optional
the end presentation time Returns:
pts_unit : str, optional vframes (Tensor[T, H, W, C]): the `T` video frames
unit in which start_pts and end_pts values will be interpreted, either 'pts' or 'sec'. Defaults to 'pts'. aframes (Tensor[K, L]): the audio frames, where `K` is the number of channels and `L` is the
Returns
-------
vframes : Tensor[T, H, W, C]
the `T` video frames
aframes : Tensor[K, L]
the audio frames, where `K` is the number of channels and `L` is the
number of points number of points
info : Dict info (Dict): metadata for the video and audio. Can contain the fields video_fps (float)
metadata for the video and audio. Can contain the fields video_fps (float)
and audio_fps (int) and audio_fps (int)
""" """
...@@ -368,20 +351,15 @@ def read_video_timestamps(filename: str, pts_unit: str = "pts") -> Tuple[List[in ...@@ -368,20 +351,15 @@ def read_video_timestamps(filename: str, pts_unit: str = "pts") -> Tuple[List[in
Note that the function decodes the whole video frame-by-frame. Note that the function decodes the whole video frame-by-frame.
Parameters Args:
---------- filename (str): path to the video file
filename : str pts_unit (str, optional): unit in which timestamp values will be returned
path to the video file either 'pts' or 'sec'. Defaults to 'pts'.
pts_unit : str, optional
unit in which timestamp values will be returned either 'pts' or 'sec'. Defaults to 'pts'. Returns:
pts (List[int] if pts_unit = 'pts', List[Fraction] if pts_unit = 'sec'):
Returns
-------
pts : List[int] if pts_unit = 'pts'
List[Fraction] if pts_unit = 'sec'
presentation timestamps for each one of the frames in the video. presentation timestamps for each one of the frames in the video.
video_fps : float, optional video_fps (float, optional): the frame rate for the video
the frame rate for the video
""" """
from torchvision import get_video_backend from torchvision import get_video_backend
......
...@@ -18,10 +18,6 @@ def _make_divisible(v: float, divisor: int, min_value: Optional[int] = None) -> ...@@ -18,10 +18,6 @@ def _make_divisible(v: float, divisor: int, min_value: Optional[int] = None) ->
It ensures that all layers have a channel number that is divisible by 8 It ensures that all layers have a channel number that is divisible by 8
It can be seen here: It can be seen here:
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
:param v:
:param divisor:
:param min_value:
:return:
""" """
if min_value is None: if min_value is None:
min_value = divisor min_value = divisor
......
...@@ -20,21 +20,14 @@ def nms(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor: ...@@ -20,21 +20,14 @@ def nms(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
not guaranteed to be the same between CPU and GPU. This is similar not guaranteed to be the same between CPU and GPU. This is similar
to the behavior of argsort in PyTorch when repeated values are present. to the behavior of argsort in PyTorch when repeated values are present.
Parameters Args:
---------- boxes (Tensor[N, 4])): boxes to perform NMS on. They
boxes : Tensor[N, 4])
boxes to perform NMS on. They
are expected to be in (x1, y1, x2, y2) format are expected to be in (x1, y1, x2, y2) format
scores : Tensor[N] scores (Tensor[N]): scores for each one of the boxes
scores for each one of the boxes iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
iou_threshold : float
discards all overlapping Returns:
boxes with IoU > iou_threshold keep (Tensor): int64 tensor with the indices
Returns
-------
keep : Tensor
int64 tensor with the indices
of the elements that have been kept of the elements that have been kept
by NMS, sorted in decreasing order of scores by NMS, sorted in decreasing order of scores
""" """
...@@ -55,23 +48,15 @@ def batched_nms( ...@@ -55,23 +48,15 @@ def batched_nms(
Each index value correspond to a category, and NMS Each index value correspond to a category, and NMS
will not be applied between elements of different categories. will not be applied between elements of different categories.
Parameters Args:
---------- boxes (Tensor[N, 4]): boxes where NMS will be performed. They
boxes : Tensor[N, 4]
boxes where NMS will be performed. They
are expected to be in (x1, y1, x2, y2) format are expected to be in (x1, y1, x2, y2) format
scores : Tensor[N] scores (Tensor[N]): scores for each one of the boxes
scores for each one of the boxes idxs (Tensor[N]): indices of the categories for each one of the boxes.
idxs : Tensor[N] iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
indices of the categories for each one of the boxes.
iou_threshold : float Returns:
discards all overlapping boxes keep (Tensor): int64 tensor with the indices of
with IoU > iou_threshold
Returns
-------
keep : Tensor
int64 tensor with the indices of
the elements that have been kept by NMS, sorted the elements that have been kept by NMS, sorted
in decreasing order of scores in decreasing order of scores
""" """
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment