Unverified Commit 89bc3079 authored by Nicolas Hug's avatar Nicolas Hug Committed by GitHub
Browse files

Unify parameters formatting in docstrings (#3268)

parent e04de77c
......@@ -238,8 +238,10 @@ maskUtils = mask_util
def loadRes(self, resFile):
"""
Load result file and return a result api object.
:param resFile (str) : file name of result file
:return: res (obj) : result api object
Args:
resFile (str): file name of result file
Returns:
res (obj): result api object
"""
res = COCO()
res.dataset['images'] = [img for img in self.dataset['images']]
......
......@@ -181,17 +181,14 @@ def _read_video_from_file(
Reads a video from a file, returning both the video frames as well as
the audio frames
Args
----------
filename : str
path to the video file
seek_frame_margin: double, optional
seeking frame in the stream is imprecise. Thus, when video_start_pts
is specified, we seek the pts earlier by seek_frame_margin seconds
read_video_stream: int, optional
whether read video stream. If yes, set to 1. Otherwise, 0
video_width/video_height/video_min_dimension/video_max_dimension: int
together decide the size of decoded frames
Args:
filename (str): path to the video file
seek_frame_margin (double, optional): seeking frame in the stream is imprecise. Thus,
when video_start_pts is specified, we seek the pts earlier by seek_frame_margin seconds
read_video_stream (int, optional): whether read video stream. If yes, set to 1. Otherwise, 0
video_width/video_height/video_min_dimension/video_max_dimension (int): together decide
the size of decoded frames:
- When video_width = 0, video_height = 0, video_min_dimension = 0,
and video_max_dimension = 0, keep the original frame resolution
- When video_width = 0, video_height = 0, video_min_dimension != 0,
......@@ -214,30 +211,19 @@ def _read_video_from_file(
and video_max_dimension = 0, resize the frame so that frame
video_width and video_height are set to $video_width and
$video_height, respectively
video_pts_range : list(int), optional
the start and end presentation timestamp of video stream
video_timebase: Fraction, optional
a Fraction rational number which denotes timebase in video stream
read_audio_stream: int, optional
whether read audio stream. If yes, set to 1. Otherwise, 0
audio_samples: int, optional
audio sampling rate
audio_channels: int optional
audio channels
audio_pts_range : list(int), optional
the start and end presentation timestamp of audio stream
audio_timebase: Fraction, optional
a Fraction rational number which denotes time base in audio stream
video_pts_range (list(int), optional): the start and end presentation timestamp of video stream
video_timebase (Fraction, optional): a Fraction rational number which denotes timebase in video stream
read_audio_stream (int, optional): whether read audio stream. If yes, set to 1. Otherwise, 0
audio_samples (int, optional): audio sampling rate
audio_channels (int optional): audio channels
audio_pts_range (list(int), optional): the start and end presentation timestamp of audio stream
audio_timebase (Fraction, optional): a Fraction rational number which denotes time base in audio stream
Returns
-------
vframes : Tensor[T, H, W, C]
the `T` video frames
aframes : Tensor[L, K]
the audio frames, where `L` is the number of points and
vframes (Tensor[T, H, W, C]): the `T` video frames
aframes (Tensor[L, K]): the audio frames, where `L` is the number of points and
`K` is the number of audio_channels
info : Dict
metadata for the video and audio. Can contain the fields video_fps (float)
info (Dict): metadata for the video and audio. Can contain the fields video_fps (float)
and audio_fps (int)
"""
_validate_pts(video_pts_range)
......@@ -345,17 +331,15 @@ def _read_video_from_memory(
the audio frames
This function is torchscriptable.
Args
----------
video_data : data type could be 1) torch.Tensor, dtype=torch.int8 or 2) python bytes
Args:
video_data (data type could be 1) torch.Tensor, dtype=torch.int8 or 2) python bytes):
compressed video content stored in either 1) torch.Tensor 2) python bytes
seek_frame_margin: double, optional
seeking frame in the stream is imprecise. Thus, when video_start_pts is specified,
we seek the pts earlier by seek_frame_margin seconds
read_video_stream: int, optional
whether read video stream. If yes, set to 1. Otherwise, 0
video_width/video_height/video_min_dimension/video_max_dimension: int
together decide the size of decoded frames
seek_frame_margin (double, optional): seeking frame in the stream is imprecise.
Thus, when video_start_pts is specified, we seek the pts earlier by seek_frame_margin seconds
read_video_stream (int, optional): whether read video stream. If yes, set to 1. Otherwise, 0
video_width/video_height/video_min_dimension/video_max_dimension (int): together decide
the size of decoded frames:
- When video_width = 0, video_height = 0, video_min_dimension = 0,
and video_max_dimension = 0, keep the original frame resolution
- When video_width = 0, video_height = 0, video_min_dimension != 0,
......@@ -378,27 +362,19 @@ def _read_video_from_memory(
and video_max_dimension = 0, resize the frame so that frame
video_width and video_height are set to $video_width and
$video_height, respectively
video_pts_range : list(int), optional
the start and end presentation timestamp of video stream
video_timebase_numerator / video_timebase_denominator: optional
a rational number which denotes timebase in video stream
read_audio_stream: int, optional
whether read audio stream. If yes, set to 1. Otherwise, 0
audio_samples: int, optional
audio sampling rate
audio_channels: int optional
audio audio_channels
audio_pts_range : list(int), optional
the start and end presentation timestamp of audio stream
audio_timebase_numerator / audio_timebase_denominator: optional
video_pts_range (list(int), optional): the start and end presentation timestamp of video stream
video_timebase_numerator / video_timebase_denominator (float, optional): a rational
number which denotes timebase in video stream
read_audio_stream (int, optional): whether read audio stream. If yes, set to 1. Otherwise, 0
audio_samples (int, optional): audio sampling rate
audio_channels (int optional): audio audio_channels
audio_pts_range (list(int), optional): the start and end presentation timestamp of audio stream
audio_timebase_numerator / audio_timebase_denominator (float, optional):
a rational number which denotes time base in audio stream
Returns
-------
vframes : Tensor[T, H, W, C]
the `T` video frames
aframes : Tensor[L, K]
the audio frames, where `L` is the number of points and
Returns:
vframes (Tensor[T, H, W, C]): the `T` video frames
aframes (Tensor[L, K]): the audio frames, where `L` is the number of points and
`K` is the number of channels
"""
......
......@@ -119,18 +119,14 @@ def encode_png(input: torch.Tensor, compression_level: int = 6) -> torch.Tensor:
Takes an input tensor in CHW layout and returns a buffer with the contents
of its corresponding PNG file.
Parameters
----------
input: Tensor[channels, image_height, image_width]
int8 image tensor of `c` channels, where `c` must 3 or 1.
compression_level: int
Compression factor for the resulting file, it must be a number
Args:
input (Tensor[channels, image_height, image_width]): int8 image tensor of
`c` channels, where `c` must 3 or 1.
compression_level (int): Compression factor for the resulting file, it must be a number
between 0 and 9. Default: 6
Returns
-------
output: Tensor[1]
A one dimensional int8 tensor that contains the raw bytes of the
Returns:
output (Tensor[1]): A one dimensional int8 tensor that contains the raw bytes of the
PNG file.
"""
output = torch.ops.image.encode_png(input, compression_level)
......@@ -142,14 +138,11 @@ def write_png(input: torch.Tensor, filename: str, compression_level: int = 6):
Takes an input tensor in CHW layout (or HW in the case of grayscale images)
and saves it in a PNG file.
Parameters
----------
input: Tensor[channels, image_height, image_width]
int8 image tensor of `c` channels, where `c` must be 1 or 3.
filename: str
Path to save the image.
compression_level: int
Compression factor for the resulting file, it must be a number
Args:
input (Tensor[channels, image_height, image_width]): int8 image tensor of
`c` channels, where `c` must be 1 or 3.
filename (str): Path to save the image.
compression_level (int): Compression factor for the resulting file, it must be a number
between 0 and 9. Default: 6
"""
output = encode_png(input, compression_level)
......@@ -182,18 +175,14 @@ def encode_jpeg(input: torch.Tensor, quality: int = 75) -> torch.Tensor:
Takes an input tensor in CHW layout and returns a buffer with the contents
of its corresponding JPEG file.
Parameters
----------
input: Tensor[channels, image_height, image_width])
int8 image tensor of `c` channels, where `c` must be 1 or 3.
quality: int
Quality of the resulting JPEG file, it must be a number between
Args:
input (Tensor[channels, image_height, image_width])): int8 image tensor of
`c` channels, where `c` must be 1 or 3.
quality (int): Quality of the resulting JPEG file, it must be a number between
1 and 100. Default: 75
Returns
-------
output: Tensor[1]
A one dimensional int8 tensor that contains the raw bytes of the
Returns:
output (Tensor[1]): A one dimensional int8 tensor that contains the raw bytes of the
JPEG file.
"""
if quality < 1 or quality > 100:
......@@ -208,14 +197,11 @@ def write_jpeg(input: torch.Tensor, filename: str, quality: int = 75):
"""
Takes an input tensor in CHW layout and saves it in a JPEG file.
Parameters
----------
input: Tensor[channels, image_height, image_width]
int8 image tensor of `c` channels, where `c` must be 1 or 3.
filename: str
Path to save the image.
quality: int
Quality of the resulting JPEG file, it must be a number
Args:
input (Tensor[channels, image_height, image_width]): int8 image tensor of `c`
channels, where `c` must be 1 or 3.
filename (str): Path to save the image.
quality (int): Quality of the resulting JPEG file, it must be a number
between 1 and 100. Default: 75
"""
output = encode_jpeg(input, quality)
......@@ -230,20 +216,16 @@ def decode_image(input: torch.Tensor, mode: ImageReadMode = ImageReadMode.UNCHAN
Optionally converts the image to the desired format.
The values of the output tensor are uint8 between 0 and 255.
Parameters
----------
input: Tensor
a one dimensional uint8 tensor containing the raw bytes of the
Args:
input (Tensor): a one dimensional uint8 tensor containing the raw bytes of the
PNG or JPEG image.
mode: ImageReadMode
the read mode used for optionally converting the image.
mode (ImageReadMode): the read mode used for optionally converting the image.
Default: `ImageReadMode.UNCHANGED`.
See `ImageReadMode` class for more information on various
available modes.
Returns
-------
output: Tensor[image_channels, image_height, image_width]
Returns:
output (Tensor[image_channels, image_height, image_width])
"""
output = torch.ops.image.decode_image(input, mode.value)
return output
......@@ -255,19 +237,15 @@ def read_image(path: str, mode: ImageReadMode = ImageReadMode.UNCHANGED) -> torc
Optionally converts the image to the desired format.
The values of the output tensor are uint8 between 0 and 255.
Parameters
----------
path: str
path of the JPEG or PNG image.
mode: ImageReadMode
the read mode used for optionally converting the image.
Args:
path (str): path of the JPEG or PNG image.
mode (ImageReadMode): the read mode used for optionally converting the image.
Default: `ImageReadMode.UNCHANGED`.
See `ImageReadMode` class for more information on various
available modes.
Returns
-------
output: Tensor[image_channels, image_height, image_width]
Returns:
output (Tensor[image_channels, image_height, image_width])
"""
data = read_file(path)
return decode_image(data, mode)
......@@ -63,27 +63,18 @@ def write_video(
"""
Writes a 4d tensor in [T, H, W, C] format in a video file
Parameters
----------
filename : str
path where the video will be saved
video_array : Tensor[T, H, W, C]
tensor containing the individual frames, as a uint8 tensor in [T, H, W, C] format
fps : Number
video frames per second
video_codec : str
the name of the video codec, i.e. "libx264", "h264", etc.
options : Dict
dictionary containing options to be passed into the PyAV video stream
audio_array : Tensor[C, N]
tensor containing the audio, where C is the number of channels and N is the
number of samples
audio_fps : Number
audio sample rate, typically 44100 or 48000
audio_codec : str
the name of the audio codec, i.e. "mp3", "aac", etc.
audio_options : Dict
dictionary containing options to be passed into the PyAV audio stream
Args:
filename (str): path where the video will be saved
video_array (Tensor[T, H, W, C]): tensor containing the individual frames,
as a uint8 tensor in [T, H, W, C] format
fps (Number): video frames per second
video_codec (str): the name of the video codec, i.e. "libx264", "h264", etc.
options (Dict): dictionary containing options to be passed into the PyAV video stream
audio_array (Tensor[C, N]): tensor containing the audio, where C is the number of channels
and N is the number of samples
audio_fps (Number): audio sample rate, typically 44100 or 48000
audio_codec (str): the name of the audio codec, i.e. "mp3", "aac", etc.
audio_options (Dict): dictionary containing options to be passed into the PyAV audio stream
"""
_check_av_available()
video_array = torch.as_tensor(video_array, dtype=torch.uint8).numpy()
......@@ -251,28 +242,20 @@ def read_video(
Reads a video from a file, returning both the video frames as well as
the audio frames
Parameters
----------
filename : str
path to the video file
start_pts : int if pts_unit = 'pts', optional
float / Fraction if pts_unit = 'sec', optional
the start presentation time of the video
end_pts : int if pts_unit = 'pts', optional
float / Fraction if pts_unit = 'sec', optional
the end presentation time
pts_unit : str, optional
unit in which start_pts and end_pts values will be interpreted, either 'pts' or 'sec'. Defaults to 'pts'.
Returns
-------
vframes : Tensor[T, H, W, C]
the `T` video frames
aframes : Tensor[K, L]
the audio frames, where `K` is the number of channels and `L` is the
Args:
filename (str): path to the video file
start_pts (int if pts_unit = 'pts', float / Fraction if pts_unit = 'sec', optional):
The start presentation time of the video
end_pts (int if pts_unit = 'pts', float / Fraction if pts_unit = 'sec', optional):
The end presentation time
pts_unit (str, optional): unit in which start_pts and end_pts values will be interpreted,
either 'pts' or 'sec'. Defaults to 'pts'.
Returns:
vframes (Tensor[T, H, W, C]): the `T` video frames
aframes (Tensor[K, L]): the audio frames, where `K` is the number of channels and `L` is the
number of points
info : Dict
metadata for the video and audio. Can contain the fields video_fps (float)
info (Dict): metadata for the video and audio. Can contain the fields video_fps (float)
and audio_fps (int)
"""
......@@ -368,20 +351,15 @@ def read_video_timestamps(filename: str, pts_unit: str = "pts") -> Tuple[List[in
Note that the function decodes the whole video frame-by-frame.
Parameters
----------
filename : str
path to the video file
pts_unit : str, optional
unit in which timestamp values will be returned either 'pts' or 'sec'. Defaults to 'pts'.
Returns
-------
pts : List[int] if pts_unit = 'pts'
List[Fraction] if pts_unit = 'sec'
Args:
filename (str): path to the video file
pts_unit (str, optional): unit in which timestamp values will be returned
either 'pts' or 'sec'. Defaults to 'pts'.
Returns:
pts (List[int] if pts_unit = 'pts', List[Fraction] if pts_unit = 'sec'):
presentation timestamps for each one of the frames in the video.
video_fps : float, optional
the frame rate for the video
video_fps (float, optional): the frame rate for the video
"""
from torchvision import get_video_backend
......
......@@ -18,10 +18,6 @@ def _make_divisible(v: float, divisor: int, min_value: Optional[int] = None) ->
It ensures that all layers have a channel number that is divisible by 8
It can be seen here:
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
:param v:
:param divisor:
:param min_value:
:return:
"""
if min_value is None:
min_value = divisor
......
......@@ -20,21 +20,14 @@ def nms(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
not guaranteed to be the same between CPU and GPU. This is similar
to the behavior of argsort in PyTorch when repeated values are present.
Parameters
----------
boxes : Tensor[N, 4])
boxes to perform NMS on. They
Args:
boxes (Tensor[N, 4])): boxes to perform NMS on. They
are expected to be in (x1, y1, x2, y2) format
scores : Tensor[N]
scores for each one of the boxes
iou_threshold : float
discards all overlapping
boxes with IoU > iou_threshold
Returns
-------
keep : Tensor
int64 tensor with the indices
scores (Tensor[N]): scores for each one of the boxes
iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
Returns:
keep (Tensor): int64 tensor with the indices
of the elements that have been kept
by NMS, sorted in decreasing order of scores
"""
......@@ -55,23 +48,15 @@ def batched_nms(
Each index value correspond to a category, and NMS
will not be applied between elements of different categories.
Parameters
----------
boxes : Tensor[N, 4]
boxes where NMS will be performed. They
Args:
boxes (Tensor[N, 4]): boxes where NMS will be performed. They
are expected to be in (x1, y1, x2, y2) format
scores : Tensor[N]
scores for each one of the boxes
idxs : Tensor[N]
indices of the categories for each one of the boxes.
iou_threshold : float
discards all overlapping boxes
with IoU > iou_threshold
Returns
-------
keep : Tensor
int64 tensor with the indices of
scores (Tensor[N]): scores for each one of the boxes
idxs (Tensor[N]): indices of the categories for each one of the boxes.
iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
Returns:
keep (Tensor): int64 tensor with the indices of
the elements that have been kept by NMS, sorted
in decreasing order of scores
"""
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment