持续创造,加快成长!这是我参与「日新计划 6 月更文应战」的第21天,点击检查活动详情
ShowMeAI日报系列全新晋级!覆盖AI人工智能 东西&结构 | 项目&代码 | 博文&共享 | 数据&资源 | 研究&论文 等方向。点击检查 历史文章列表,在大众号内订阅话题 #ShowMeAI资讯日报,可接纳每日最新推送。点击 专题合辑&电子月刊 快速浏览各专题全集。
1.东西&结构
东西库:PaddleMM – 多模态学习东西包
PaddleMM以百度 PaddlePaddle 渠道为主,兼容 PyTorch 供给 torch 版别,旨在于供给模态联合学习和跨模态学习算法模型库,为处理图片文本等多模态数据供给高效的处理计划,助力多模态学习运用落地
‘PaddleMM – Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.’ by njustkmg
GitHub: github.com/njustkmg/Pa…
东西库:pytorch-lifestream – 根据自监督在离散事件序列上完成嵌入的PyTorch库
‘pytorch-lifestream – A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision’ by Dmitri Babaev
GitHub: github.com/dllllb/pyto…
东西库:PyG – PyTorch图神经网络(GNN)库
‘PyG (PyTorch Geometric) – a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data’
GitHub: github.com/pyg-team/pytorch_geometric
东西:Macaron – 可视化Web开发编辑器
‘Macaron – Visual component editor for Web development’ by macaron-elements
GitHub: github.com/macaron-ele…
东西:Please – 漂亮的极简终端新标签页东西
‘Please – Minimalistic New Tab Page with a greeting, date and time, inspirational quotes and your personal tasks and to-do list’ by NayamAmarshe
GitHub: github.com/NayamAmarsh…
2.项目&代码
NTIRE 2022压缩视频超分辨率和质量增强应战优胜计划代码
‘How We Win the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video’ by Qunliang Xing
GitHub: github.com/ryanxingql/…
3.博文&共享
免费书籍资源:深度学习数学工程
《The Mathematical Engineering of Deep Learning》by Benoit Liquet
Link: deeplearningmath.org/
4.数据&资源
资源列表:Awesome Software Architecture – 软件架构相关资源大列表
‘Awesome Software Architecture – A curated list of awesome articles, videos, and other resources to learn and practice software architecture, patterns, and principles.’ by Mehdi Hadeli
GitHub: github.com/mehdihadeli…
资源列表:域自适应目标检测相关文献列表
‘awesome-domain-adaptation-object-detection – A collection of papers about domain adaptation object detection’ by wangs311
GitHub: github.com/wangs311/aw…
资源列表:换脸检测相关文献东西资源大列表
‘Awesome Deepfakes Detection – A list of tools, papers and code related to Deepfake Detection.’ by Daichi Zhang
GitHub: github.com/Daisy-Zhang…
5.研究&论文
大众号回复要害字 日报,免费获取整理好的6月论文合辑。
论文:LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks
论文标题:LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks
论文时刻:14 Jun 2022
所属范畴:自然言语处理
对应使命:Classification,Pretrained Language Models,分类,预练习言语模型
论文地址:arxiv.org/abs/2206.06…
代码完成:github.com/uw-madison-…
论文作者:Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee
论文简介:LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling “no-code machine learning with LMs.” / LIFT 不会对模型架构或丢失函数进行任何更改,它仅依赖于自然言语界面,完成了“运用言语模型进行无代码机器学习”。
论文摘要:Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding layer with an image patch embedding layer, the word token output layer with a 10-way output layer, and the word prediction loss with a 10-way classification loss, respectively. A natural question arises: can LM fine-tuning solve non-language downstream tasks without changing the model architecture or loss function? To answer this, we propose Language-Interfaced Fine-Tuning (LIFT) and study its efficacy and limitations by conducting an extensive empirical study on a suite of non-language classification and regression tasks. LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling “no-code machine learning with LMs.” We find that LIFT performs relatively well across a wide range of low-dimensional classification and regression tasks, matching the performances of the best baselines in many cases, especially for the classification tasks. We report the experimental results on the fundamental properties of LIFT, including its inductive bias, sample efficiency, ability to extrapolate, robustness to outliers and label noise, and generalization. We also analyze a few properties/techniques specific to LIFT, e.g., context-aware learning via appropriate prompting, quantification of predictive uncertainty, and two-stage fine-tuning. Our code is available at github.com/UW-Madison-…
在不进行任何架构更改的情况下微调预练习言语模型 (LM) 已成为学习各种言语下流使命的规范。但是,关于非言语的下流使命,一种常见的做法是对输入、输出层和丢失函数选用特定于使命的规划。例如,能够经过将单词嵌入层替换为图画块嵌入层、将单词标记输出层替换为 10 路输出层以及将单词猜测丢失替换为 10 路分类丢失,来将 LM 微调为 MNIST 分类器。 一个自然的问题呈现了:LM 微调能否在不改变模型架构或丢失函数的情况下处理非言语的下流使命?为了答复这个问题,咱们提出了言语接口微调 (LIFT),并经过对一套非言语分类和回归使命进行广泛的实证研究来研究其成效和局限性。 LIFT 不会对模型架构或丢失函数进行任何更改,它仅依赖于自然言语界面,完成了“运用 LM 进行无代码机器学习”。咱们发现 LIFT 在广泛的低维分类和回归使命中表现相对较好,在许多情况下与最佳基线的性能相匹配,特别是关于分类使命。咱们的陈述显现了 LIFT 根本特性的实验成果,包含它的归纳偏差、样本效率、外推才能、对异常值和标签噪声的鲁棒性以及泛化性。咱们还剖析了一些特定于 LIFT 的特点/技能,例如,经过适当的提示进行上下文感知学习、猜测不确定性的量化和两阶段微调。咱们的代码可在 github.com/UW-Madison-… 取得
论文:Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
论文标题:Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
论文时刻:15 Jun 2022
所属范畴:自然言语处理
对应使命:Image Captioning,object-detection,Object Detection,Phrase Grounding,Question Answering,Referring Expression,Referring Expression Comprehension,Visual Question Answering,VQA,看图说话,目标检测,物体检测,问答,指称表达,指称表达了解,视觉问答
论文地址:arxiv.org/abs/2206.07…
代码完成:github.com/microsoft/f…
论文作者:Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, JianFeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann Lecun, Nanyun Peng, Jianfeng Gao, Lijuan Wang
论文简介:Vision-language (VL) pre-training has recently received considerable attention. / 视觉言语(VL)预练习模型最近遭到了相当大的重视。
论文摘要:Vision-language (VL) pre-training has recently received considerable attention. However, most existing end-to-end pre-training approaches either only aim to tackle VL tasks such as image-text retrieval, visual question answering (VQA) and image captioning that test high-level understanding of images, or only target region-level understanding for tasks such as phrase grounding and object detection. We present FIBER (Fusion-In-the-Backbone-based transformER), a new VL model architecture that can seamlessly handle both these types of tasks. Instead of having dedicated transformer layers for fusion after the uni-modal backbones, FIBER pushes multimodal fusion deep into the model by inserting cross-attention into the image and text backbones, bringing gains in terms of memory and performance. In addition, unlike previous work that is either only pre-trained on image-text data or on fine-grained data with box-level annotations, we present a two-stage pre-training strategy that uses both these kinds of data efficiently: (i) coarse-grained pre-training based on image-text data; followed by (ii) fine-grained pre-training based on image-text-box data. We conduct comprehensive experiments on a wide range of VL tasks, ranging from VQA, image captioning, and retrieval, to phrase grounding, referring expression comprehension, and object detection. Using deep multimodal fusion coupled with the two-stage pre-training, FIBER provides consistent performance improvements over strong baselines across all tasks, often outperforming methods using magnitudes more data. Code is available at github.com/microsoft/F…
视觉言语(VL)预练习模型最近遭到了相当多的重视。但是,大多数现有的端到端预练习办法要么仅针对 VL 使命,例如图画文本检索、视觉问答 (VQA) 和测验对图画的高级了解的“看图说话”,要么仅针对区域对比如短语接地和目标检测等使命的了解水平。咱们提出了 FIBER(根据 Fusion-In-the-Backbone 的变换器),这是一种新的 VL 模型架构,能够无缝处理这两种类型的使命。 FIBER 经过在图画和文本主干中插入穿插注意力,将多模态交融深化模型中,而不是在单模态主干之后运用专用的变压器层进行交融,然后在内存和性能方面带来收益。此外,与之前的作业要么仅对图画文本数据或带有框级注释的细粒度数据进行预练习,咱们提出了一种两阶段预练习策略,能够有效地运用这两种数据:( i) 根据图文数据的粗粒度预练习;其次是(ii)根据图画文本框数据的细粒度预练习。咱们对广泛的 VL 使命进行了全面的实验,从 VQA、看图说话和图画检索,到短语基础、指称表达了解和目标检测。运用深度多模式交融与两阶段预练习相结合,FIBER 在所有使命的强基线上供给了一致的性能改善,通常优于运用更多数据的办法。代码位于 github.com/microsoft/F…
论文:Variable Bitrate Neural Fields
论文标题:Variable Bitrate Neural Fields
论文时刻:15 Jun 2022
论文地址:arxiv.org/abs/2206.07…
代码完成:github.com/nv-tlabs/vq…
论文作者:Towaki Takikawa, Alex Evans, Jonathan Tremblay, Thomas Mller, Morgan McGuire, Alec Jacobson, Sanja Fidler
论文简介:Neural approximations of scalar and vector fields, such as signed distance functions and radiance fields, have emerged as accurate, high-quality representations. / 标量和矢量场的神经近似,例如有符号间隔函数和辐射场,已经成为精确、高质量的表明。
论文摘要:Neural approximations of scalar and vector fields, such as signed distance functions and radiance fields, have emerged as accurate, high-quality representations. State-of-the-art results are obtained by conditioning a neural approximation with a lookup from trainable feature grids that take on part of the learning task and allow for smaller, more efficient neural networks. Unfortunately, these feature grids usually come at the cost of significantly increased memory consumption compared to stand-alone neural network models. We present a dictionary method for compressing such feature grids, reducing their memory consumption by up to 100x and permitting a multiresolution representation which can be useful for out-of-core streaming. We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available and with dynamic topology and structure. Our source code will be available at github.com/nv-tlabs/vq…
标量和矢量场的神经近似,例如有符号间隔函数和辐射场,已经成为精确、高质量的表明。最先进的成果是经过从可练习的特征网格中查找来调理神经近似来取得的,这些特征网格承担部分学习使命并答应更小、更高效的神经网络。不幸的是,与独立的神经网络模型比较,这些特征网格通常以明显增加内存耗费为价值。咱们提出了一种用于压缩此类特征网格的字典办法,将它们的内存耗费削减多达 100 倍,并答应多分辨率表明,这关于核外流式传输很有用。咱们将字典优化制定为向量量化的主动解码器问题,它使咱们能够在没有直接监督且具有动态拓扑和结构的空间中学习端到端的离散神经表明。咱们的源代码将在 github.com/nv-tlabs/vq… 上供给
论文:CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
论文标题:CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
论文时刻:28 Apr 2022
所属范畴:核算机视觉,自然言语处理
对应使命:Image Generation,Language Modelling,Super-Resolution,Text to image generation,Text-to-Image Generation,图画生成,言语建模,超分辨率,文本到图画生成
论文地址:arxiv.org/abs/2204.14…
代码完成:github.com/thudm/cogvi…
论文作者:Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang
论文简介:The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. / 根据transformer的文本到图画模型的发展遭到其对高分辨率图画的生成缓慢度和复杂性的阻止。
论文摘要:The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel auto-regressive generation. We pretrain a 6B-parameter transformer with a simple and flexible self-supervised task, Cross-modal general language model (CogLM), and finetune it for fast super-resolution. The new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2, and naturally supports interactive text-guided editing on images.
根据Transformer的文本到图画模型的发展遭到其对高分辨率图画的生成缓慢度和复杂性的阻止。在这项作业中,咱们提出了一种根据分层Transformer和局部并行自回归生成的处理计划。咱们运用简略灵敏的自监督使命、跨模态通用言语模型 (CogLM) 预练习 6B 参数转换器,并对其进行微调以完成快速超分辨率。与并发最先进的 DALL-E-2 比较,新的文本到图画体系 CogView2 显现出十分具有竞争力的生成成果,而且自然支撑对图画进行交互式文本引导编辑。
论文:Scaling Up Models and Data with t5x and seqio
论文标题:Scaling Up Models and Data with t5x and seqio
论文时刻:31 Mar 2022
所属范畴:自然言语处理
对应使命:言语模型
论文地址:arxiv.org/abs/2203.17…
代码完成:github.com/google-rese… , github.com/google/seqi…
论文作者:Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen, Kathleen Kenealy, Jonathan H. Clark, Stephan Lee, Dan Garrette, James Lee-Thorp, Colin Raffel, Noam Shazeer, Marvin Ritter, Maarten Bosma, Alexandre Passos, Jeremy Maitin-Shepard, Noah Fiedel, Mark Omernick, Brennan Saeta, Ryan Sepassi, Alexander Spiridonov, Joshua Newlan, Andrea Gesmundo
论文简介:Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. / 最近根据神经网络的言语模型从扩大练习数据集的巨细和模型自身的参数数量中收获颇丰。
论文摘要:Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: t5x simplifies the process of building and training large language models at scale while maintaining ease of use, and seqio provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. t5x and seqio are open source and available at github.com/google-rese… and github.com/google/seqi… respectively.
最近根据神经网络的言语模型从扩大练习数据集的巨细和模型自身的参数数量中收获颇丰。因为各种因素,包含需要在超级核算机集群(例如 TPU)上分配核算、防止输入数据时呈现瓶颈以及保证可重复的成果,扩大规模可能会很复杂。在这项作业中,咱们提出了两个软件库来缓解这些问题:t5x 简化了大规模构建和练习大型言语模型的过程,一起坚持易用性,seqio 供给了一个根据使命的 API,用于简略地创建快速和可重复的练习数据和评价管道。这些开源库已用于在具稀有 TB 练习数据的数据集上练习具稀有千亿参数的模型。除了这些库,咱们还发布了相似 T5 的编码器-解码器模型以及相似 GPT 的仅解码器架构的配置和阐明。 t5x 和 seqio 是开源的,可在 github.com/google-rese… 和 github.com/google /seq… 分别。
论文:OmniXAI: A Library for Explainable AI
论文标题:OmniXAI: A Library for Explainable AI
论文时刻:1 Jun 2022
所属范畴:模型可解说
对应使命:Counterfactual Explanation,Decision Making,Feature Engineering,Interpretable Machine Learning,Time Series,反现实解说,决议计划制定,特征工程,可解说机器学习,时刻序列
论文地址:arxiv.org/abs/2206.01…
代码完成:github.com/salesforce/…
论文作者:Wenzhuo Yang, Hung Le, Silvio Savarese, Steven C. H. Hoi
论文简介:We introduce OmniXAI (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice. / 咱们介绍了 OmniXAI(Omni eXplainable AI 的缩写),一个可解说 AI(XAI)的开源 Python 库,它供给全方位可解说的 AI 才能和各种可解说的机器学习技能,处理机器学习 (ML) 在实践运用中做决议计划的了解和解说痛点。
论文摘要:We introduce OmniXAI (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice. OmniXAI aims to be a one-stop comprehensive library that makes explainable AI easy for data scientists, ML researchers and practitioners who need explanation for various types of data, models and explanation methods at different stages of ML process (data exploration, feature engineering, model development, evaluation, and decision-making, etc). In particular, our library includes a rich family of explanation methods integrated in a unified interface, which supports multiple data types (tabular data, images, texts, time-series), multiple types of ML models (traditional ML in Scikit-learn and deep learning models in PyTorch/TensorFlow), and a range of diverse explanation methods including “model-specific” and “model-agnostic” ones (such as feature-attribution explanation, counterfactual explanation, gradient-based explanation, etc). For practitioners, the library provides an easy-to-use unified interface to generate the explanations for their applications by only writing a few lines of codes, and also a GUI dashboard for visualization of different explanations for more insights about decisions. In this technical report, we present OmniXAI’s design principles, system architectures, and major functionalities, and also demonstrate several example use cases across different types of data, tasks, and models.
咱们介绍了 OmniXAI(Omni eXplainable AI 的缩写),它是一个可解说 AI(XAI)的开源 Python 库,它供给了全方位可解说的 AI 功用和各种可解说的机器学习技能,以处理机器学习 (ML) 在实践运用中做决议计划的了解和解说痛点。 OmniXAI 旨在成为一个一站式综合库,让数据科学家、ML 研究人员和从业者在 ML 过程的不同阶段(数据探索、特征工程、模型开发、评价和决议计划等)。特别是,咱们的库包含丰厚的解说办法家族,集成在一个一致的界面中,支撑多种数据类型(表格数据、图画、文本、时刻序列)、多种类型的 ML 模型(Scikit-learn 中的传统 ML 和深度PyTorch/TensorFlow 中的学习模型),以及一系列不同的解说办法,包含“特定于模型”和“与模型无关”的解说办法(例如特征归因解说、反现实解说、根据梯度的解说等)。关于从业者,该库供给了一个易于运用的一致界面,只需编写几行代码即可为他们的运用程序生成解说,还供给了一个 GUI 仪表板,用于可视化不同的解说,以取得对模型决议计划成果的更多了解视角。在这份技能陈述中,咱们介绍了 OmniXAI 的规划原则、体系架构和主要功用,并展现了跨不同类型的数据、使命和模型的几个示例用例。
论文:VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
论文标题:VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
论文时刻:CVPR 2022
所属范畴:核算机视觉
对应使命:Space-time Video Super-resolution,Super-Resolution,Video Super-Resolution,时空视频超分辨率,超分辨率,视频超分辨率
论文地址:arxiv.org/abs/2206.04…
代码完成:github.com/picsart-ai-…
论文作者:Zeyuan Chen, Yinbo Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, Xiaolong Wang
论文简介:The learned implicit neural representation can be decoded to videos of arbitrary spatial resolution and frame rate. / 学习到的隐式神经表明能够解码为恣意空间分辨率和帧速率的视频。
论文摘要:Videos typically record the streaming and continuous visual data as discrete consecutive frames. Since the storage cost is expensive for videos of high fidelity, most of them are stored in a relatively low resolution and frame rate. Recent works of Space-Time Video Super-Resolution (STVSR) are developed to incorporate temporal interpolation and spatial super-resolution in a unified framework. However, most of them only support a fixed up-sampling scale, which limits their flexibility and applications. In this work, instead of following the discrete representations, we propose Video Implicit Neural Representation (VideoINR), and we show its applications for STVSR. The learned implicit neural representation can be decoded to videos of arbitrary spatial resolution and frame rate. We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales and significantly outperforms prior works on continuous and out-of-training-distribution scales. Our project page is at zeyuan-chen.com/VideoINR/
视频通常将流和接连的视觉数据记录为离散的接连帧。因为高保真视频的存储成本贵重,因此大部分以较低的分辨率和帧率存储。最近开发了时空视频超分辨率 (STVSR) 的作品,以将时刻插值和空间超分辨率结合在一个一致的结构中。但是,它们中的大多数只支撑固定的上采样份额,这约束了它们的灵敏性和运用。在这项作业中,咱们提出了视频隐式神经表明 (VideoINR),而不是遵循离散表明,并展现了它在 STVSR 中的运用。学习到的隐式神经表明能够解码为恣意空间分辨率和帧速率的视频。咱们表明,VideoINR 在常见的上采样尺度上经过最先进的 STVSR 办法完成了具有竞争力的性能,而且在接连和练习外分布尺度上明显优于从前的作业。咱们的项目页面在zeyuan-chen.com/VideoINR/
论文:Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions
论文标题:Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions
论文时刻:8 Jun 2022
所属范畴:核算机视觉
对应使命:Autonomous Vehicles,Pose Estimation,Semantic Segmentation,主动驾驭轿车,姿态估计,语义切割
论文地址:arxiv.org/abs/2206.04…
代码完成:github.com/prbonn/4dmo…
论文作者:Benedikt Mersch, Xieyuanli Chen, Ignacio Vizzo, Lucas Nunes, Jens Behley, Cyrill Stachniss
论文简介:A key challenge for autonomous vehicles is to navigate in unseen dynamic environments. / 主动驾驭轿车的一个要害应战是在看不见的动态环境中导航。
论文摘要:A key challenge for autonomous vehicles is to navigate in unseen dynamic environments. Separating moving objects from static ones is essential for navigation, pose estimation, and understanding how other traffic participants are likely to move in the near future. In this work, we tackle the problem of distinguishing 3D LiDAR points that belong to currently moving objects, like walking pedestrians or driving cars, from points that are obtained from non-moving objects, like walls but also parked cars. Our approach takes a sequence of observed LiDAR scans and turns them into a voxelized sparse 4D point cloud. We apply computationally efficient sparse 4D convolutions to jointly extract spatial and temporal features and predict moving object confidence scores for all points in the sequence. We develop a receding horizon strategy that allows us to predict moving objects online and to refine predictions on the go based on new observations. We use a binary Bayes filter to recursively integrate new predictions of a scan resulting in more robust estimation. We evaluate our approach on the SemanticKITTI moving object segmentation challenge and show more accurate predictions than existing methods. Since our approach only operates on the geometric information of point clouds over time, it generalizes well to new, unseen environments, which we evaluate on the Apollo dataset.
主动驾驭轿车的一个要害应战是在看不见的动态环境中导航。将移动目标与静态目标分隔关于导航、姿势估计和了解其他交通目标接下来将如何移动至关重要。在这项作业中,咱们处理了将归于当时移动物体(如步行行人或驾驭轿车)的 3D LiDAR 点与从非移动物体(如墙面和停放的轿车)取得的点区分隔来的问题。咱们的办法选用一系列观察到的 LiDAR 扫描,并将它们变成体素化的稀少 4D 点云。咱们运用核算效率高的稀少 4D 卷积来联合提取空间和时刻特征,并猜测序列中所有点的移动目标置信度得分。咱们研发了一种撤退视野策略,使咱们能够在线猜测移动物体,并根据新的观察成果在行进路径中改善猜测。咱们运用二元贝叶斯滤波器递归地集成扫描的新猜测,然后产生更稳健的估计。咱们评价了咱们在 SemanticKITTI 移动目标切割应战中的办法,并展现了比现有办法更精确的猜测。因为咱们的办法只对点云的几许信息进行操作,因此它能够很好地推行到新的、看不见的环境,咱们在 Apollo 数据集上对其进行评价。
咱们是 ShowMeAI,致力于传播AI优质内容,共享行业处理计划,用常识加快每一次技能成长!点击检查 历史文章列表,在大众号内订阅话题 #ShowMeAI资讯日报,可接纳每日最新推送。点击 专题合辑&电子月刊 快速浏览各专题全集。
- 作者:韩信子@ShowMeAI
- 历史文章列表
- 专题合辑&电子月刊
- 声明:版权所有,转载请联络渠道与作者并注明出处
- 欢迎回复,拜托点赞,留言引荐中有价值的文章、东西或建议,咱们都会尽快回复哒~