人工智能 | ShowMeAI资讯日报 #2022.06.05

ShowMeAI日报系列全新晋级!掩盖AI人工智能 东西&结构 | 项目&代码 | 博文&共享 | 数据&资源 | 研究&论文 等方向。点击检查 历史文章列表,在公众号内订阅论题 #ShowMeAI资讯日报,可接收每日最新推送。点击 专题合辑&电子月刊 快速阅读各专题全集。

1.东西&结构

人工智能 | ShowMeAI资讯日报 #2022.06.05

东西库:ivy – 机器学习一致结构

tags:[机器学习]

支撑一切结构,现在支撑 Jax,TensorFlow,PyTorch,MXNet 和 Numpy

‘ivy – The Unified Machine Learning Framework’

GitHub:github.com/unifyai/ivy

人工智能 | ShowMeAI资讯日报 #2022.06.05

东西:pytest-memray – Python内存剖析东西

tags:[内存剖析]

‘pytest-memray – Memray is a memory profiler for Python’ by bloomberg

GitHub:github.com/bloomberg/p…

东西:Obsei – 面向文本剖析的低代码主动化东西

tags:[低代码,主动化,文本剖析]

‘Obsei: Observe, Analyze and Inform – Obsei is intended to be an automation tool for text analysis need.’ by Lalit Pagaria

GitHub:github.com/lalitpagari…

人工智能 | ShowMeAI资讯日报 #2022.06.05

东西库:Frontend – COCO图片标示东西

tags:[数据标示,图片标示]

‘COCO Image Labelling Tool – Frontend (FE)’ by NAVER AI

GitHub:github.com/naver-ai/co…

人工智能 | ShowMeAI资讯日报 #2022.06.05

2.项目&代码

人工智能 | ShowMeAI资讯日报 #2022.06.05

项目:90+个Python数据科学实战项目

tags:[数据科学]

《90+ Data Science Projects You Can Try with Python》by Aman Kharwal

Link:python.plainenglish.io/85-data-sci…

项目:中文医学 常识图谱 CMeKG 东西代码及模型

tags:[医疗,常识图谱]

GitHub:github.com/king-yyf/CM…

3.博文&共享

人工智能 | ShowMeAI资讯日报 #2022.06.05

博文:在生产中布置机器学习模型的考虑要素

tags:[布置,机器学习]

《Considerations for Deploying Machine Learning Models in Production》by Jules S. Damji, Michael Galarnyk

Link:towardsdatascience.com/considerati…

免费书本:高效深度学习

tags:[深度学习]

《Efficient Deep Learning》by Gaurav Menghani, Naresh Singh

Link:efficientdlbook.com/

4.数据&资源

人工智能 | ShowMeAI资讯日报 #2022.06.05

资源:Made With ML – 机器学习项目(代码/博客文章等)共享社区

tags:[机器学习]

“Made With ML – Share what you’ve made with ML”

Link:madewithml.com/

资源:《计量经济学》博士课程资料

tags:[金融,经济,课程资料]

‘Econometrics – Slides for the PhD level course in Econometrics at the Tinbergen Institute, Amsterdam’ by Stanislav Avdeev

GitHub:github.com/stnavdeev/e…

资源:中国支撑长途办公的部分公司

tags:[资讯]

GitHub:github.com/LinuxSuRen/…

人工智能 | ShowMeAI资讯日报 #2022.06.05

资源:半导体创业公司大列表

tags:[资讯]

‘awesome-semiconductor-startups – List of awesome semiconductor startups’ by Andreas Olofsson

GitHub:github.com/aolofsson/a…

资源:2022机器学习工作市场

tags:[工作,资讯]

《All Roads Lead to Rome: The Machine Learning Job Market in 2022》by Eric Jang

Link:evjang.com/2022/04/25/…

5.研究&论文

人工智能 | ShowMeAI资讯日报 #2022.06.05

能够点击 这里 回复关键字 日报,免费获取整理好的6月论文合辑。

论文:BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework

论文标题:BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework

论文时刻:27 May 2022

所属范畴:Computer Vision/计算机视觉

对应使命:3D Object Detection,Autonomous Driving,Object Detection,3D物体检测,主动驾驶,物体检测

论文地址:arxiv.org/abs/2205.13…

代码完结:github.com/adlab-autod…

论文作者:TingTing Liang, Hongwei Xie, Kaicheng Yu, Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Tao Tang, Bing Wang, Zhi Tang

论文简介:Fusing the camera and LiDAR information has become a de-facto standard for 3D object detection tasks. / 交融相机和激光雷达信息已成为 3D 目标检测使命的事实标准。

论文摘要:Fusing the camera and LiDAR information has become a de-facto standard for 3D object detection tasks. Current methods rely on point clouds from the LiDAR sensor as queries to leverage the feature from the image space. However, people discover that this underlying assumption makes the current fusion framework infeasible to produce any prediction when there is a LiDAR malfunction, regardless of minor or major. This fundamentally limits the deployment capability to realistic autonomous driving scenarios. In contrast, we propose a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data, thus addressing the downside of previous methods. We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings. Under the robustness training settings that simulate various LiDAR malfunctions, our framework significantly surpasses the state-of-the-art methods by 15.7% to 28.9% mAP. To the best of our knowledge, we are the first to handle realistic LiDAR malfunction and can be deployed to realistic scenarios without any post-processing procedure. The code is available at github.com/ADLab-AutoD… .

交融相机和激光雷达信息已成为 3D 目标检测使命的事实标准。当时的办法依赖于来自 LiDAR 传感器的点云作为查询来利用图画空间中的特征。但是,人们发现,这种根本假定使得当时的交融结构无法在发生 LiDAR 毛病时做出任何猜测,无论是轻微仍是严峻。这从根本上将布置才能限制在现实的主动驾驶场景中。相比之下,咱们提出了一个令人惊奇的简略而新颖的交融结构,称为 BEVFusion,其相机流不依赖于 LiDAR 数据的输入,然后处理了曾经办法的缺点。咱们凭经验标明,咱们的结构在正常练习设置下超越了最先进的办法。在模仿各种 LiDAR 毛病的鲁棒性练习设置下,咱们的结构明显超过了最先进的办法 15.7% 到 28.9% 的 mAP。据咱们所知,咱们是第一个处理现实 LiDAR 毛病的人,而且能够在没有任何后处理程序的情况下布置到现实场景中。该代码可在 github.com/ADLab-AutoD… 取得。

人工智能 | ShowMeAI资讯日报 #2022.06.05

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:Voxel Field Fusion for 3D Object Detection

论文标题:Voxel Field Fusion for 3D Object Detection

论文时刻:31 May 2022

所属范畴:Computer Vision/计算机视觉

对应使命:3D Object Detection,Data Augmentation,Object Detection,3D物体检测,数据增强,物体检测

论文地址:arxiv.org/abs/2205.15…

代码完结:github.com/dvlab-resea…

论文作者:Yanwei Li, Xiaojuan Qi, Yukang Chen, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

论文简介:In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. / 在这项工作中,咱们提出了一个概念上简略但有用的跨模态 3D 目标检测结构,称为体素场交融。

论文摘要:In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. The proposed approach aims to maintain cross-modality consistency by representing and fusing augmented image features as a ray in the voxel field. To this end, the learnable sampler is first designed to sample vital features from the image plane that are projected to the voxel grid in a point-to-ray manner, which maintains the consistency in feature representation with spatial context. In addition, ray-wise fusion is conducted to fuse features with the supplemental context in the constructed voxel field. We further develop mixed augmentor to align feature-variant transformations, which bridges the modality gap in data augmentation. The proposed framework is demonstrated to achieve consistent gains in various benchmarks and outperforms previous fusion-based methods on KITTI and nuScenes datasets. Code is made available at github.com/dvlab-resea… .

在这项工作中,咱们提出了一个概念上简略但有用的跨模态 3D 目标检测结构,称为体素场交融。所提出的办法旨在经过将增强图画特征表示和交融为体素场中的射线来坚持跨模态一致性。为此,可学习采样器首先规划用于从图画平面采样以点对射线办法投影到体素网格的重要特征,然后坚持特征表示与空间上下文的一致性。此外,进行射线交融以将特征与构建的体素场中的补充上下文交融。咱们进一步开发了混合增强器来对齐特征变体转换,然后弥合了数据增强中的模态差距。所提出的结构被证明能够在各种基准测验中完结一致的收益,而且优于曾经在 KITTI 和 nuScenes 数据集上根据交融的办法。代码可在 github.com/dvlab-resea… 取得。

人工智能 | ShowMeAI资讯日报 #2022.06.05

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

论文标题:Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

论文时刻:2 Jun 2022

所属范畴:Speech/语音

对应使命:Automatic Speech Recognition,Speech Recognition,主动语音辨认,语音辨认

论文地址:arxiv.org/abs/2206.00…

代码完结:github.com/kssteven418…

论文作者:Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

论文简介:After reexamining the design choices for both the macro and micro-architecture of Conformer, we propose the Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes. / 在从头检查了 Conformer 的微观和微观架构的规划挑选后,咱们提出了 Squeezeformer 模型,该模型在相同的练习方案下一直优于最先进的 ASR 模型。

论文摘要:The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series of systematic studies, we find that the Conformer architecture’s design choices are not optimal. After reexamining the design choices for both the macro and micro-architecture of Conformer, we propose the Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes. In particular, for the macro-architecture, Squeezeformer incorporates (i) the Temporal U-Net structure, which reduces the cost of the multi-head attention modules on long sequences, and (ii) a simpler block structure of feed-forward module, followed up by multi-head attention or convolution modules, instead of the Macaron structure proposed in Conformer. Furthermore, for the micro-architecture, Squeezeformer (i) simplifies the activations in the convolutional block, (ii) removes redundant Layer Normalization operations, and (iii) incorporates an efficient depth-wise downsampling layer to efficiently sub-sample the input signal. Squeezeformer achieves state-of-the-art results of 7.5%, 6.5%, and 6.0% word-error-rate on Librispeech test-other without external language models. This is 3.1%, 1.4%, and 0.6% better than Conformer-CTC with the same number of FLOPs. Our code is open-sourced and available online.

最近提出的 Conformer 模型已成为各种下游语音使命的事实上的主干模型,根据其捕获部分和全局特征的混合注意力卷积架构。但是,经过一系列体系的研究,咱们发现Conformer架构的规划挑选并不是最优的。在从头检查了 Conformer 的微观和微观架构的规划挑选后,咱们提出了 Squeezeformer 模型,该模型在相同的练习方案下一直优于最先进的 ASR 模型。特别是,关于宏架构,Squeezeformer 结合了(i)Temporal U-Net 结构,这降低了长序列上的多头注意力模块的成本,以及(ii)前馈模块的更简略的块结构,紧随其后的是多头注意力或卷积模块,而不是 Conformer 中提出的 Macaron 结构。此外,关于微架构,Squeezeformer (i) 简化了卷积块中的激活,(ii) 删除了冗余的层归一化操作,以及 (iii) 结合了有用的深度下采样层以有用地对输入信号进行子采样。在没有外部言语模型的情况下,Squeezeformer 在 Librispeech test-other 上完结了 7.5%、6.5% 和 6.0% word-error-rate的前沿一流成果。这比具有相同 FLOP 数量的 Conformer-CTC 好 3.1%、1.4% 和 0.6%。咱们的代码是开源的,能够在github获取。

人工智能 | ShowMeAI资讯日报 #2022.06.05

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:Zero-Shot Text-to-Image Generation

论文标题:Zero-Shot Text-to-Image Generation

论文时刻:24 Feb 2021

所属范畴:Playing Games/游戏

对应使命:Image Generation,Text to image generation,Text-to-Image Generation,Zero-Shot Text-to-Image Generation,图画生成,文本到图画生成,文本到图画生成,零样本文本到图画生成

论文地址:<https://arxiv.org/abs/2102.12092

代码完结:github.com/openai/DALL… , github.com/lucidrains/… , github.com/borisdayma/… , github.com/kakaobrain/… , github.com/xyzforever/…

论文作者:Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

论文简介:Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. / 文本到图画的生成传统上专心于为固定数据集的练习寻觅更好的建模假定。

论文摘要:Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.

文本到图画的生成传统上专心于为固定数据集的练习寻觅更好的建模假定。这些假定或许涉及杂乱的架构、辅佐损失或辅佐信息,例如练习期间提供的目标部分标签或分割掩码。咱们描绘了一种根据转换器的简略办法,该转换器将文本和图画符号自回归建模为单个数据流。凭仗满足的数据和规模,咱们的办法在以零样本办法评估时与曾经的特定范畴模型具有竞争力。

人工智能 | ShowMeAI资讯日报 #2022.06.05

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:OnePose: One-Shot Object Pose Estimation without CAD Models

论文标题:OnePose: One-Shot Object Pose Estimation without CAD Models

论文时刻:24 May 2022

所属范畴:Computer Vision/计算机视觉

对应使命:6D Pose Estimation,Graph Attention,Pose Estimation,Visual Localization,6D姿势估量,图形注意,姿势估量,视觉定位

论文地址:arxiv.org/abs/2205.12…

代码完结:github.com/zju3dv/OneP…

论文作者:Jiaming Sun, ZiHao Wang, Siyu Zhang, Xingyi He, Hongcheng Zhao, Guofeng Zhang, Xiaowei Zhou

论文简介:We propose a new method named OnePose for object pose estimation. / 咱们提出了一种名为 OnePose 的新办法来进行物体姿势估量。

论文摘要:We propose a new method named OnePose for object pose estimation. Unlike existing instance-level or category-level methods, OnePose does not rely on CAD models and can handle objects in arbitrary categories without instance- or category-specific network training. OnePose draws the idea from visual localization and only requires a simple RGB video scan of the object to build a sparse SfM model of the object. Then, this model is registered to new query images with a generic feature matching network. To mitigate the slow runtime of existing visual localization methods, we propose a new graph attention network that directly matches 2D interest points in the query image with the 3D points in the SfM model, resulting in efficient and robust pose estimation. Combined with a feature-based pose tracker, OnePose is able to stably detect and track 6D poses of everyday household objects in real-time. We also collected a large-scale dataset that consists of 450 sequences of 150 objects.

咱们提出了一种名为 OnePose 的新办法来进行目标姿势估量。与现有的实例级或类别级办法不同,OnePose 不依赖 CAD 模型,而且能够处理任意类别中的目标,而无需实例或类别特定的网络练习。 OnePose 从视觉定位中汲取思想,只需求对目标进行简略的 RGB 视频扫描,即可构建目标的稀少 SfM 模型。然后,将该模型注册到具有通用特征匹配网络的新查询图画。为了缓解现有视觉定位办法的缓慢运转时刻,咱们提出了一种新的图注意力网络,该网络直接将查询图画中的 2D 爱好点与 SfM 模型中的 3D 点进行匹配,然后完结高效且稳健的姿势估量。结合根据特征的姿势跟踪器,OnePose 能够实时稳定地检测和跟踪日常家居目标的 6D 姿势。咱们还收集了一个包括 150 个目标的 450 个序列的大规模数据集。

人工智能 | ShowMeAI资讯日报 #2022.06.05

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:Improved Vector Quantized Diffusion Models

论文标题:Improved Vector Quantized Diffusion Models

论文时刻:31 May 2022

所属范畴:Computer Vision/计算机视觉

对应使命:Denoising,Image Generation,去噪,图画生成

论文地址:arxiv.org/abs/2205.16…

代码完结:github.com/microsoft/v…

论文作者:Zhicong Tang, Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

论文简介:When trained on ImageNet, we dramatically improve the FID score from 11. 89 to 4. 83, demonstrating the superiority of our proposed techniques. / 在 ImageNet 上练习时,咱们将 FID 分数从 11. 89 明显进步到 4. 83,证明了咱们提出的技能的优越性。

论文摘要:Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image synthesis, but sometimes can still generate low-quality samples or weakly correlated images with text input. We find these issues are mainly due to the flawed sampling strategy. In this paper, we propose two important techniques to further improve the sample quality of VQ-Diffusion. 1) We explore classifier-free guidance sampling for discrete denoising diffusion model and propose a more general and effective implementation of classifier-free guidance. 2) We present a high-quality inference strategy to alleviate the joint distribution issue in VQ-Diffusion. Finally, we conduct experiments on various datasets to validate their effectiveness and show that the improved VQ-Diffusion suppresses the vanilla version by large margins. We achieve an 8.44 FID score on MSCOCO, surpassing VQ-Diffusion by 5.42 FID score. When trained on ImageNet, we dramatically improve the FID score from 11.89 to 4.83, demonstrating the superiority of our proposed techniques.

矢量量化分散(VQ-Diffusion)是一种用于文本到图画组成的强壮生成模型,但有时仍然或许生成低质量样本或与文本输入相关的弱图画。咱们发现这些问题首要是因为有缺点的抽样战略形成的。在本文中,咱们提出了两种重要的技能来进一步进步 VQ-Diffusion 的样本质量。 1)咱们探究了离散去噪分散模型的无分类辅导抽样,并提出了一种更通用、更有用的无分类辅导完结。 2)咱们提出了一种高质量的推理战略来缓解 VQ-Diffusion 中的联合散布问题。最后,咱们对各种数据集进行了试验以验证它们的有用性,并标明改进的 VQ-Diffusion 大大抑制了原始版本的问题。咱们在 MSCOCO 上取得了 8.44 FID 分数,超过了 VQ-Diffusion 5.42 FID 分数。在 ImageNet 上练习时,咱们将 FID 分数从 11.89 明显进步到 4.83,证明了咱们提出的技能的优越性。

人工智能 | ShowMeAI资讯日报 #2022.06.05

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:Optimizing Relevance Maps of Vision Transformers Improves Robustness

论文标题:Optimizing Relevance Maps of Vision Transformers Improves Robustness

论文时刻:2 Jun 2022

所属范畴:计算机视觉

对应使命:图画辨认,模型了解,图画分类

论文地址:arxiv.org/abs/2206.01…

代码完结:github.com/hila-chefer…

论文作者:Hila Chefer, Idan Schwartz, Lior Wolf

论文简介:It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes. / 据调查,视觉分类模型一般首要依赖于图画布景,而忽略了远景,这损害了它们对散布变化的鲁棒性。

论文摘要:It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes. To alleviate this shortcoming, we propose to monitor the model’s relevancy signal and manipulate it such that the model is focused on the foreground object. This is done as a finetuning step, involving relatively few samples consisting of pairs of images and their associated foreground masks. Specifically, we encourage the model’s relevancy map (i) to assign lower relevance to background regions, (ii) to consider as much information as possible from the foreground, and (iii) we encourage the decisions to have high confidence. When applied to Vision Transformer (ViT) models, a marked improvement in robustness to domain shifts is observed. Moreover, the foreground masks can be obtained automatically, from a self-supervised variant of the ViT model itself; therefore no additional supervision is required.

据调查,视觉分类模型一般首要依赖于图画布景,而忽略了远景,这损害了它们对散布变化的鲁棒性。为了缓解这个缺点,咱们主张监控模型的相关性信号并对其进行操作,使模型专心于远景目标。这是作为微调步骤完结的,涉及相对较少的样本,这些样本由成对的图画及其相关的远景蒙版组成。具体来说,咱们鼓舞模型的相关性图 (i) 将较低的相关性分配给布景区域,(ii) 考虑尽或许多的来自远景的信息,以及 (iii) 咱们鼓舞决议计划具有高置信度。当应用于 Vision Transformer (ViT) 模型时,能够调查到域转移的鲁棒性明显进步。此外,能够从 ViT 模型自身的自我监督变体中主动取得远景蒙版;因而不需求额外的监督。

人工智能 | ShowMeAI资讯日报 #2022.06.05

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:Group R-CNN for Weakly Semi-supervised Object Detection with Points

论文标题:Group R-CNN for Weakly Semi-supervised Object Detection with Points

论文时刻:12 May 2022

所属范畴:Computer Vision/计算机视觉

对应使命:Object Detection,Representation Learning,Semi-Supervised Object Detection,目标检测,表示学习,半监督目标检测,物体检测

论文地址:arxiv.org/abs/2205.05…

代码完结:github.com/jshilong/gr…

论文作者:Shilong Zhang, Zhuoran Yu, Liyang Liu, Xinjiang Wang, Aojun Zhou, Kai Chen

论文简介:The core of this task is to train a point-to-box regressor on well-labeled images that can be used to predict credible bounding boxes for each point annotation. / 这项使命的中心是在符号杰出的图画上练习一个点到框回归器,该回归器可用于猜测每个点注释的可信鸿沟框。

论文摘要:We study the problem of weakly semi-supervised object detection with points (WSSOD-P), where the training data is combined by a small set of fully annotated images with bounding boxes and a large set of weakly-labeled images with only a single point annotated for each instance. The core of this task is to train a point-to-box regressor on well-labeled images that can be used to predict credible bounding boxes for each point annotation. We challenge the prior belief that existing CNN-based detectors are not compatible with this task. Based on the classic R-CNN architecture, we propose an effective point-to-box regressor: Group R-CNN. Group R-CNN first uses instance-level proposal grouping to generate a group of proposals for each point annotation and thus can obtain a high recall rate. To better distinguish different instances and improve precision, we propose instance-level proposal assignment to replace the vanilla assignment strategy adopted in the original R-CNN methods. As naive instance-level assignment brings converging difficulty, we propose instance-aware representation learning which consists of instance-aware feature enhancement and instance-aware parameter generation to overcome this issue. Comprehensive experiments on the MS-COCO benchmark demonstrate the effectiveness of our method. Specifically, Group R-CNN significantly outperforms the prior method Point DETR by 3.9 mAP with 5% well-labeled images, which is the most challenging scenario. The source code can be found at github.com/jshilong/Gr…

咱们研究了带点的弱半监督目标检测(WSSOD-P)问题,其间练习数据由一小部分带鸿沟框的完全注释图画和一大组仅带鸿沟框的弱符号图画组合而成。为每个实例注释一个点。该使命的中心是在符号杰出的图画上练习一个点到框回归器,该回归器可用于猜测每个点注释的可信鸿沟框。咱们应战了现有的根据 CNN 的检测器与该使命不兼容的先验信念。根据经典的 R-CNN 架构,咱们提出了一种有用的点到框回归器:Group R-CNN。 Group R-CNN首先运用实例级proposal grouping为每个点注释生成一组proposal,然后能够取得较高的召回率。为了更好地区别不同的实例并进步精度,咱们提出实例级提案分配来替代原始 R-CNN 办法中采用的一般分配战略。因为原始版本的实例级分配带来了收敛困难,咱们提出了实例感知表示学习,它包括实例感知特征增强和实例感知参数生成来战胜这个问题。 MS-COCO 基准的归纳试验证明了咱们办法的有用性。具体来说,在最具应战性的场景下,Group R-CNN 以3.9 mAP,5% 的杰出符号图画明显优于先前的办法 Point DETR。源代码能够在github.com/jshilong/Gr…找到

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:Learning to Untangle Genome Assembly with Graph Convolutional Networks

论文标题:Learning to Untangle Genome Assembly with Graph Convolutional Networks

论文时刻:1 Jun 2022

所属范畴:生物科技,基因科技

对应使命:图神经网络

论文地址:arxiv.org/abs/2206.00…

代码完结:github.com/lvrcek/gnno…

论文作者:Lovro Vrek, Xavier Bresson, Thomas Laurent, Martin Schmitz, Mile iki

论文简介:In this work, we explore a different approach to the central part of the genome assembly task that consists of untangling a large assembly graph from which a genomic sequence needs to be reconstructed. / 在这项工作中,咱们探究了一种不同的办法来处理基因组拼装使命的中心部分,该使命包括解开需求重建基因组序列的大型拼装图。

论文摘要:A quest to determine the complete sequence of a human DNA from telomere to telomere started three decades ago and was finally completed in 2021. This accomplishment was a result of a tremendous effort of numerous experts who engineered various tools and performed laborious manual inspection to achieve the first gapless genome sequence. However, such method can hardly be used as a general approach to assemble different genomes, especially when the assembly speed is critical given the large amount of data. In this work, we explore a different approach to the central part of the genome assembly task that consists of untangling a large assembly graph from which a genomic sequence needs to be reconstructed. Our main motivation is to reduce human-engineered heuristics and use deep learning to develop more generalizable reconstruction techniques. Precisely, we introduce a new learning framework to train a graph convolutional network to resolve assembly graphs by finding a correct path through them. The training is supervised with a dataset generated from the resolved CHM13 human sequence and tested on assembly graphs built using real human PacBio HiFi reads. Experimental results show that a model, trained on simulated graphs generated solely from a single chromosome, is able to remarkably resolve all other chromosomes. Moreover, the model outperforms hand-crafted heuristics from a state-of-the-art \textit{de novo} assembler on the same graphs. Reconstructed chromosomes with graph networks are more accurate on nucleotide level, report lower number of contigs, higher genome reconstructed fraction and NG50/NGA50 assessment metrics.

确认人类 DNA 从端粒到端粒的完好序列的探究始于 30 年前,最总算 2021 年完结。这一成就是很多专家支付巨大尽力的成果,他们规划了各种东西并进行了费力的人工检查完结第一个无空隙基因组序列。但是,这种办法简直不能用作拼装不同基因组的通用办法,尤其是当拼装速度关于大量数据至关重要时。在这项工作中,咱们探究了一种不同的办法来处理基因组拼装使命的中心部分,包括解开需求重建基因组序列的大型拼装图。咱们的首要动机是减少人工规划的启发式办法,并运用深度学习来开发更通用的重建技能。精确地说,咱们引入了一个新的学习结构来练习图卷积网络,经过找到正确的途径来解析装配图。练习运用从解析的 CHM13 人类序列生成的数据集进行监督,并在运用实在人类 PacBio HiFi 读取构建的装配图上进行测验。试验成果标明,在仅从单个染色体生成的模仿图上练习的模型能够明显解析一切其他染色体。此外,该模型在相同的图上优于来自最先进的de novo汇编器的手工启发式算法。运用图形网络重建的染色体在核苷酸水平上更精确,陈述的重叠群数量更少,基因组重建分数和 NG50/NGA50 评估目标更高。

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

论文标题:Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

论文时刻:28 Oct 2021

所属范畴:深度学习

论文地址:arxiv.org/abs/2110.14…

代码完结:github.com/hpcaitech/c…

论文作者:Zhengda Bian, Hongxin Liu, Boxiang Wang, Haichen Huang, Yongbin Li, Chuanrui Wang, Fan Cui, Yang You

论文简介:The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. / Transformer 架构进步了深度学习模型在计算机视觉和自然言语处理等范畴的功能。

论文摘要:The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. It remains a challenge for AI researchers to implement complex distributed training solutions for their models. In this paper, we introduce Colossal-AI, which is a unified parallel training system designed to seamlessly integrate different paradigms of parallelization techniques including data parallelism, pipeline parallelism, multiple tensor parallelism, and sequence parallelism. Colossal-AI aims to support the AI community to write distributed models in the same way as how they write models normally. This allows them to focus on developing the model architecture and separates the concerns of distributed training from the development process. The documentations can be found at [https://www.colossalai.org](www.colossalai.org) and the source code can be found at github.com/hpcaitech/C….

Transformer 架构进步了深度学习模型在计算机视觉和自然言语处理等范畴的功能。伴随着更好的功能而来的是更大的模型尺度。这对当时 GPU 等加快器硬件的内存墙提出了应战。在单个 GPU 或单个机器上练习 Vision Transformer、BERT 和 GPT 等大型模型历来都不是抱负的挑选。迫切需求在散布式环境中练习模型。但是,散布式练习,尤其是模型并行性,一般需求计算机体系和架构方面的范畴专业常识。人工智能研究人员为他们的模型实施杂乱的散布式练习处理方案仍然是一个应战。在本文中,咱们介绍 Colossal-AI,它是一个一致的并行练习体系,旨在无缝集成不同范式的并行化技能,包括数据并行、管道并行、多张量并行和序列并行。 Colossal-AI 旨在支撑 AI 社区以与他们一般编写模型的办法相同的办法编写散布式模型。这使他们能够专心于开发模型架构,并将散布式练习的关注点与开发过程分开。文档能够在 www.colossalai.org 找到,源代码能够在 [github.com/hpcaitech/C… github.com/hpcaitech/C…

人工智能 | ShowMeAI资讯日报 #2022.06.05

人工智能 | ShowMeAI资讯日报 #2022.06.05

论文:BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection

论文标题:BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection

论文时刻:31 Mar 2022

所属范畴:Computer Vision/计算机视觉

对应使命:3D Object Detection,Object Detection,3D物体检测,物体检测

论文地址:arxiv.org/abs/2203.17…

代码完结:github.com/HuangJunJie…

论文作者:JunJie Huang, Guan Huang

论文简介:Single frame data contains finite information which limits the performance of the existing vision-based multi-camera 3D object detection paradigms. / 单帧数据包括有限的信息,这限制了现有根据视觉的多相机 3D 目标检测范例的功能。

论文摘要:Single frame data contains finite information which limits the performance of the existing vision-based multi-camera 3D object detection paradigms. For fundamentally pushing the performance boundary in this area, a novel paradigm dubbed BEVDet4D is proposed to lift the scalable BEVDet paradigm from the spatial-only 3D space to the spatial-temporal 4D space. We upgrade the naive BEVDet framework with a few modifications just for fusing the feature from the previous frame with the corresponding one in the current frame. In this way, with negligible additional computing budget, we enable BEVDet4D to access the temporal cues by querying and comparing the two candidate features. Beyond this, we simplify the task of velocity prediction by removing the factors of ego-motion and time in the learning target. As a result, BEVDet4D with robust generalization performance reduces the velocity error by up to -62.9%. This makes the vision-based methods, for the first time, become comparable with those relied on LiDAR or radar in this aspect. On challenge benchmark nuScenes, we report a new record of 54.5% NDS with the high-performance configuration dubbed BEVDet4D-Base, which surpasses the previous leading method BEVDet-Base by +7.3% NDS.

单帧数据包括有限的信息,限制了现有的根据视觉的多相机 3D 目标检测范例的功能。为了从根本上推动该范畴的功能鸿沟,提出了一种名为 BEVDet4D 的新范式,将可扩展的 BEVDet 范式从仅空间的 3D 空间提升到时空 4D 空间。咱们晋级了朴素的 BEVDet 结构,并进行了一些修改,仅仅为了将前一帧的特征与当时帧中的相应特征交融。经过这种办法,在可忽略的额外计算预算的情况下,咱们使 BEVDet4D 能够经过查询和比较两个候选特征来访问时刻头绪。除此之外,咱们经过去除学习目标中的自我运动和时刻要从来简化速度猜测的使命。成果,具有强壮泛化功能的 BEVDet4D 将速度差错降低了高达 -62.9%。这使得根据视觉的办法在这方面第一次能够与依赖激光雷达或雷达的办法相媲美。在应战基准 nuScenes 上,咱们运用名为 BEVDet4D-Base 的高功能配置陈述了 54.5% NDS 的新纪录,超过了之前的抢先办法 BEVDet-Base +7.3% NDS。

人工智能 | ShowMeAI资讯日报 #2022.06.05

人工智能 | ShowMeAI资讯日报 #2022.06.05

咱们是 ShowMeAI,致力于传播AI优质内容,共享职业处理方案,用常识加快每一次技能成长!点击检查 历史文章列表,在公众号内订阅论题 #ShowMeAI资讯日报,可接收每日最新推送。点击 专题合辑&电子月刊 快速阅读各专题全集。

人工智能 | ShowMeAI资讯日报 #2022.06.05

  • 作者:韩信子@ShowMeAI
  • 历史文章列表
  • 专题合辑&电子月刊
  • 声明:版权一切,转载请联系平台与作者并注明出处
  • 欢迎回复,托付点赞,留言推荐中有价值的文章、东西或主张,咱们都会赶快回复哒~