Author Description

Login to generate an author description

Ask a Question About This Mathematician

Supersaturated design (SSD) has received much recent interest because of its potential in factor screening experiments. In this paper, we provide equivalent conditions for two columns to be fully aliased … Supersaturated design (SSD) has received much recent interest because of its potential in factor screening experiments. In this paper, we provide equivalent conditions for two columns to be fully aliased and consequently propose methods for constructing E(fNOD)- and χ2-optimal mixed-level SSDs without fully aliased columns, via equidistant designs and difference matrices. The methods can be easily performed and many new optimal mixed-level SSDs have been obtained. Furthermore, it is proved that the nonorthogonality between columns of the resulting design is well controlled by the source designs. A rather complete list of newly generated optimal mixed-level SSDs are tabulated for practical use.
Nested space-filling designs are nested designs with attractive low-dimensional stratification. Such designs are gaining popularity in statistics, applied mathematics and engineering. Their applications include multi-fidelity computer models, stochastic optimization problems, … Nested space-filling designs are nested designs with attractive low-dimensional stratification. Such designs are gaining popularity in statistics, applied mathematics and engineering. Their applications include multi-fidelity computer models, stochastic optimization problems, multi-level fitting of nonparametric functions, and linking parameters. We propose methods for constructing several new classes of nested space-filling designs. These methods are based on a new group projection and other algebraic techniques. The constructed designs can accommodate a nested structure with an arbitrary number of layers and are more flexible in run size than the existing families of nested space-filling designs. As a byproduct, the proposed methods can also be used to obtain sliced space-filling designs that are appealing for conducting computer experiments with both qualitative and quantitative factors.
General minimum lower order confounding (GMC) is a newly proposed design criterion that aims at keeping the lower order effects unaliased with one another to the extent possible. This paper … General minimum lower order confounding (GMC) is a newly proposed design criterion that aims at keeping the lower order effects unaliased with one another to the extent possible. This paper shows that for 5N/16 < n ≤ N/2, 9N/32 < n ≤ 5N/16, and 17N/64 < n ≤ 9N/32, all GMC designs with N runs and n two-level factors are projections of maximal designs with N/2, 5N/16, and 9N/32 factors, respectively. Furthermore, it provides immediate approaches to construct- ing these GMC designs from the respective maximal designs; these approaches can produce many more GMC designs than the existing computer search method.
Fractional factorial split-plot (FFSP) designs have received much attention in recent years. In this article, the matrix representation for FFSP designs with multi-level factors is first developed, which is an … Fractional factorial split-plot (FFSP) designs have received much attention in recent years. In this article, the matrix representation for FFSP designs with multi-level factors is first developed, which is an extension of the one proposed by Bingham and Sitter (1999b Bingham , D. , Sitter , R. R. ( 1999b ). Some theoretical results for fractional factorial split-plot designs . Ann. Statist. 27 : 1240 – 1255 .[Crossref] , [Google Scholar]) for the two-level case. Based on this representation, periodicity results of maximum resolution and minimum aberration for such designs are derived. Differences between FFSP designs with multi-level factors and those with two-level factors are highlighted.
Lifelong event detection aims to incrementally update a model with new event types and data while retaining the capability on previously learned old types. One critical challenge is that the … Lifelong event detection aims to incrementally update a model with new event types and data while retaining the capability on previously learned old types. One critical challenge is that the model would catastrophically forget old types when continually trained on new data. In this paper, we introduce Episodic Memory Prompts (EMP) to explicitly preserve the learned task-specific knowledge. Our method adopts continuous prompt for each task and they are optimized to instruct the model prediction and learn event-specific representation. The EMPs learned in previous tasks are carried along with the model in subsequent tasks, and can serve as a memory module that keeps the old knowledge and transferring to new tasks. Experiment results demonstrate the effectiveness of our method. Furthermore, we also conduct a comprehensive analysis of the new and old event types in lifelong learning.
Combinations of drugs are now ubiquitous in treating complex diseases such as cancer and HIV due to their potential for enhanced efficacy and reduced side effects. The traditional combination experiments … Combinations of drugs are now ubiquitous in treating complex diseases such as cancer and HIV due to their potential for enhanced efficacy and reduced side effects. The traditional combination experiments of drugs focus primarily on the dose effects of the constituent drugs. However, with the doses of drugs remaining unchanged, different sequences of drug administration may also affect the efficacy endpoint. Such drug effects shall be called as order effects. The common order‐effect linear models are usually inadequate for analyzing combination experiments due to the nonlinear relationships and complex interactions among drugs. In this article, we propose a random field model for order‐effect modeling. This model is flexible, allowing nonlinearities, and interaction effects to be incorporated with a small number of model parameters. Moreover, we propose a subtle experimental design that will collect good quality data for modeling the order effects of drugs with a reasonable run size. A real‐data analysis and simulation studies are given to demonstrate that the proposed design and model are effective in predicting the optimal drug sequences in administration.
Drawing samples from a target distribution is essential for statistical computations when the analytical solution is infeasible. Many existing sampling methods may be easy to fall into the local mode … Drawing samples from a target distribution is essential for statistical computations when the analytical solution is infeasible. Many existing sampling methods may be easy to fall into the local mode or strongly depend on the proposal distribution when the target distribution is complicated. In this article, the Global Likelihood Sampler (GLS) is proposed to tackle these problems and the GL bootstrap is used to assess the Monte Carlo error. GLS takes the advantage of the randomly shifted low-discrepancy point set to sufficiently explore the structure of the target distribution. It is efficient for multimodal and high-dimensional distributions and easy to implement. It is shown that the empirical cumulative distribution function of the samples uniformly converges to the target distribution under some conditions. The convergence for the approximate sampling distribution of the sample mean based on the GL bootstrap is also obtained. Moreover, numerical experiments and a real application are conducted to show the effectiveness, robustness, and speediness of GLS compared with some common methods. It illustrates that GLS can be a competitive alternative to existing sampling methods. Supplementary materials for this article are available online.
Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. However, confined by its inherent single-pass and sequential generation process, CoT heavily relies on … Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. However, confined by its inherent single-pass and sequential generation process, CoT heavily relies on the initial decisions, causing errors in early steps to accumulate and impact the final answers. In contrast, humans adopt recursive thinking when tackling complex reasoning problems, i.e. iteratively breaking the original problem into approachable sub-problems and aggregating their answers to resolve the original one. Inspired by the human cognitive process, we propose SOCRATIC QUESTIONING, a divide-and-conquer style algorithm that mimics the recursive thinking process. Specifically, SOCRATIC QUESTIONING leverages large language models to raise and answer sub-questions until collecting enough information to tackle the original question. Unlike CoT, SOCRATIC QUESTIONING explicitly navigates the thinking space, stimulates effective recursive thinking, and is more robust towards errors in the thinking process. Extensive experiments on several complex reasoning tasks, including MMLU, MATH, LogiQA, and visual question-answering demonstrate significant performance improvements over the state-of-the-art prompting methods, such as CoT, and Tree-of-Thought. The qualitative analysis clearly shows that the intermediate reasoning steps elicited by SOCRATIC QUESTIONING are similar to humans’ recursively thinking process of complex reasoning problems.
Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended … Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation toward the interpretable machine learning, node importance in social network, attribution models, etc. However, it could be very expensive to compute the Shapley value. Specifically, in a d-player coalition game, calculating a Shapley value requires the evaluation of d! or 2d marginal contribution values, depending on whether we are taking the permutation or combination formulation of the Shapley value. Hence, it becomes infeasible to calculate the Shapley value when d is reasonably large. A common remedy is to take a random sample of the permutations to surrogate for the complete list of permutations. We find an advanced sampling scheme can be designed to yield much more accurate estimation of the Shapley value than the simple random sampling (SRS). Our sampling scheme is based on combinatorial structures in the field of design of experiments (DOE), particularly the order-of-addition experimental designs for the study of how the orderings of components would affect the output. We show that the obtained estimates are unbiased, and can sometimes deterministically recover the original Shapley value. Both theoretical and simulations results show that our DOE-based sampling scheme outperforms SRS in terms of estimation accuracy. Surprisingly, it is also slightly faster than SRS. Lastly, real data analysis is conducted for the C. elegans nervous system and the 9/11 terrorist network. Supplementary materials for this article are available online.
Class-incremental learning (CIL) aims to develop a learning system that can continually learn new classes from a data stream without forgetting previously learned classes. When learning classes incrementally, the classifier … Class-incremental learning (CIL) aims to develop a learning system that can continually learn new classes from a data stream without forgetting previously learned classes. When learning classes incrementally, the classifier must be constantly updated to incorporate new classes, and the drift in decision boundary may lead to severe forgetting. This fundamental challenge, however, has not yet been studied extensively, especially in the setting where no samples from old classes are stored for rehearsal. In this paper, we take a closer look at how the drift in the classifier leads to forgetting, and accordingly, design four simple yet (super-) effective solutions to alleviate the classifier drift: an Individual Classifiers with Frozen Feature Extractor (ICE) framework where we individually train a classifier for each learning session, and its three variants ICE-PL, ICE-O, and ICE-PL&O which further take the logits of previously learned classes from old sessions or a constant logit of an Other class as constraint to the learning of new classifiers. Extensive experiments and analysis on 6 class-incremental information extraction tasks demonstrate that our solutions, especially ICE-O, consistently show significant improvement over the previous state-of-the-art approaches with up to 44.7% absolute F-score gain, providing a strong baseline and insights for future research on class-incremental learning.
In this paper, we investigate function computation problems under different secure conditions over a network with multiple source nodes and a single sink node which desires a function of all … In this paper, we investigate function computation problems under different secure conditions over a network with multiple source nodes and a single sink node which desires a function of all source messages without error. A wiretapper has access to some edges of the network. Based on different practical requirements, we consider two secure conditions named as secure and user secure respectively. The main parameter concerned here is the computing rate, which is the average times of the target function that can be computed securely or user securely without error for one use of the network. In the secure case, a new upper bound which is tighter than the previous one is provided for arithmetic sum functions and arbitrary networks. Moreover, we show that the improved upper bound is strictly tight for tree-like networks. In the user secure case, we give a sufficient and necessary condition for the existence of user secure network codes and obtain an upper bound for the computation capacity.
We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a … We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also described with a text description, a visual image and a set of attributes and values. To support this research, we construct AMELI, a large-scale dataset consisting of 18,472 reviews and 35,598 products. To establish baseline performance on AMELI, we experiment with the current state-of-the-art multimodal entity linking approaches and our enhanced attribute-aware model and demonstrate the importance of incorporating the attribute information into the entity linking process. To be best of our knowledge, we are the first to build benchmark dataset and solutions for the attribute-aware multimodal entity linking task. Datasets and codes will be made publicly available.
Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. However, confined by its inherent single-pass and sequential generation process, CoT heavily relies on … Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. However, confined by its inherent single-pass and sequential generation process, CoT heavily relies on the initial decisions, causing errors in early steps to accumulate and impact the final answers. In contrast, humans adopt recursive thinking when tackling complex reasoning problems, i.e., iteratively breaking the original problem into approachable sub-problems and aggregating their answers to resolve the original one. Inspired by the human cognitive process, we propose SOCRATIC QUESTIONING, a divide-and-conquer style algorithm that mimics the recursive thinking process. Specifically, SOCRATIC QUESTIONING leverages large language models to raise and answer sub-questions until collecting enough information to tackle the original question. Unlike CoT, SOCRATIC QUESTIONING explicitly navigates the thinking space, stimulates effective recursive thinking, and is more robust towards errors in the thinking process. Extensive experiments on several complex reasoning tasks, including MMLU, MATH, LogiQA, and visual question-answering demonstrate significant performance improvements over the state-of-the-art prompting methods, such as CoT, and Tree-of-Thought. The qualitative analysis clearly shows that the intermediate reasoning steps elicited by SOCRATIC QUESTIONING are similar to humans' recursively thinking process of complex reasoning problems.
The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn't use the information sufficiently about … The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn't use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper, which utilises the information about the distributions of both population and diagnosed cases, to investigate the prevalence more efficiently. The global likelihood sampling, a robust and efficient sampler to draw samples from any probability density function, is used in our sampling strategy, and thus, our new method can automatically adapt to the complicated distributions of population and cases. Moreover, the corresponding estimating method is simple, which facilitates the practical implementation. Some recommendations for practical implementation are given. Finally, several simulations and a practical example verified its efficiency.
Several network communication problems are highly related such as coded caching and distributed computation. The centralized coded caching focuses on reducing the network burden in peak times in a wireless … Several network communication problems are highly related such as coded caching and distributed computation. The centralized coded caching focuses on reducing the network burden in peak times in a wireless network system and the coded distributed computation studies the tradeoff between computation and communication in distributed system. In this paper, motivated by the study of the only rainbow $3$-term arithmetic progressions set, we propose a unified framework for constructing coded caching schemes. This framework builds bridges between coded caching schemes and lots of combinatorial objects due to the freedom of the choices of families and operations. We prove that any scheme based on a placement delivery array (PDA) can be represented by a rainbow scheme under this framework and lots of other known schemes can also be included in this framework. Moreover, we also present a new coded caching scheme with linear subpacketization and near constant rate using the only rainbow $3$-term arithmetic progressions set. Next, we modify the framework to be applicable to the distributed computing problem. We present a new transmission scheme in the shuffle phase and show that in certain cases it could have a lower communication load than the schemes based on PDAs or resolvable designs with the same number of files.
Sequential Latin hypercube designs have recently received great attention for computer experiments. Much of the work has been restricted to invariant spaces. The related systematic construction methods are inflexible while … Sequential Latin hypercube designs have recently received great attention for computer experiments. Much of the work has been restricted to invariant spaces. The related systematic construction methods are inflexible while algorithmic methods are ineffective for large designs. For such designs in space contraction, systematic construction methods have not been investigated yet. This paper proposes a new method for constructing sequential Latin hypercube designs via good lattice point sets in a variety of experimental spaces. These designs are called sequential good lattice point sets. Moreover, we provide fast and efficient approaches for identifying the (nearly) optimal sequential good lattice point sets under a given criterion. Combining with the linear level permutation technique, we obtain a class of asymptotically optimal sequential Latin hypercube designs in invariant spaces where the $L_1$-distance in each stage is either optimal or asymptotically optimal. Numerical results demonstrate that the sequential good lattice point set has a better space-filling property than the existing sequential Latin hypercube designs in the invariant space. It is also shown that the sequential good lattice point sets have less computational complexity and more adaptability.
The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn’t use the information sufficiently about … The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn’t use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper, which utilizes the information about the distributions of both population and diagnosed cases, to investigate the prevalence more efficiently. The global likelihood sampling, a robust and efficient sampler to draw samples from any probability density function, is used in our sampling strategy, and thus, our new method can automatically adapt to the complicated distributions of population and diagnosed cases. Moreover, the corresponding estimating method is simple, which facilitates the practical implementation. Some recommendations for practical implementation are given. Finally, several simulations and a practical example verify its efficiency.
Biomedical entity linking and event extraction are two crucial tasks to support text understanding and retrieval in the biomedical domain. These two tasks intrinsically benefit each other: entity linking disambiguates … Biomedical entity linking and event extraction are two crucial tasks to support text understanding and retrieval in the biomedical domain. These two tasks intrinsically benefit each other: entity linking disambiguates the biomedical concepts by referring to external knowledge bases and the domain knowledge further provides additional clues to understand and extract the biological processes, while event extraction identifies a key trigger and entities involved to describe each biological process which also captures the structural context to better disambiguate the biomedical entities. However, previous research typically solves these two tasks separately or in a pipeline, leading to error propagation. What's more, it's even more challenging to solve these two tasks together as there is no existing dataset that contains annotations for both tasks. To solve these challenges, we propose joint biomedical entity linking and event extraction by regarding the event structures and entity references in knowledge bases as latent variables and updating the two task-specific models in a hard Expectation-Maximization (EM) fashion: (1) predicting the missing variables for each partially annotated dataset based on the current two task-specific models, and (2) updating the parameters of each model on the corresponding pseudo completed dataset. Experimental results on two benchmark datasets: Genia 2011 for event extraction and BC4GO for entity linking, show that our joint framework significantly improves the model for each individual task and outperforms the strong baselines for both tasks. We will make the code and model checkpoints publicly available once the paper is accepted.
Class-incremental learning (CIL) aims to develop a learning system that can continually learn new classes from a data stream without forgetting previously learned classes. When learning classes incrementally, the classifier … Class-incremental learning (CIL) aims to develop a learning system that can continually learn new classes from a data stream without forgetting previously learned classes. When learning classes incrementally, the classifier must be constantly updated to incorporate new classes, and the drift in decision boundary may lead to severe forgetting. This fundamental challenge, however, has not yet been studied extensively, especially in the setting where no samples from old classes are stored for rehearsal. In this paper, we take a closer look at how the drift in the classifier leads to forgetting, and accordingly, design four simple yet (super-) effective solutions to alleviate the classifier drift: an Individual Classifiers with Frozen Feature Extractor (ICE) framework where we individually train a classifier for each learning session, and its three variants ICE-PL, ICE-O, and ICE-PL&O which further take the logits of previously learned classes from old sessions or a constant logit of an Other class as a constraint to the learning of new classifiers. Extensive experiments and analysis on 6 class-incremental information extraction tasks demonstrate that our solutions, especially ICE-O, consistently show significant improvement over the previous state-of-the-art approaches with up to 44.7% absolute F-score gain, providing a strong baseline and insights for future research on class-incremental learning.
Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended … Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation towards the interpretable machine learning, node importance in social network, attribution models, etc. However, its heavy computational burden has been long recognized but rarely investigated. Specifically, in a $d$-player coalition game, calculating a Shapley value requires the evaluation of $d!$ or $2^d$ marginal contribution values, depending on whether we are taking the permutation or combination formulation of the Shapley value. Hence it becomes infeasible to calculate the Shapley value when $d$ is reasonably large. A common remedy is to take a random sample of the permutations to surrogate for the complete list of permutations. We find an advanced sampling scheme can be designed to yield much more accurate estimation of the Shapley value than the simple random sampling (SRS). Our sampling scheme is based on combinatorial structures in the field of design of experiments (DOE), particularly the order-of-addition experimental designs for the study of how the orderings of components would affect the output. We show that the obtained estimates are unbiased, and can sometimes deterministically recover the original Shapley value. Both theoretical and simulations results show that our DOE-based sampling scheme outperforms SRS in terms of estimation accuracy. Surprisingly, it is also slightly faster than SRS. Lastly, real data analysis is conducted for the C. elegans nervous system and the 9/11 terrorist network.
Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide … Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks, especially unfamiliar ones. However, current methods for generative script learning rely heavily on well-structured preceding steps described in text and/or images or are limited to a certain domain, resulting in a disparity with real-world user scenarios. To address these limitations, we present a new benchmark challenge -- MultiScript, with two new tasks on task-oriented multimodal script learning: (1) multimodal script generation, and (2) subsequent step prediction. For both tasks, the input consists of a target task name and a video illustrating what has been done to complete the target task, and the expected output is (1) a sequence of structured step descriptions in text based on the demonstration video, and (2) a single text description for the subsequent step, respectively. Built from WikiHow, MultiScript covers multimodal scripts in videos and text descriptions for over 6,655 human everyday tasks across 19 diverse domains. To establish baseline performance on MultiScript, we propose two knowledge-guided multimodal generative frameworks that incorporate the task-related knowledge prompted from large language models such as Vicuna. Experimental results show that our proposed approaches significantly improve over the competitive baselines.
Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may … Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may require the evaluator to generalize to any given evaluation aspect even if it's absent during training. In this paper, we introduce X-Eval, a two-stage instruction tuning framework to evaluate the text in both seen and unseen aspects customized by end users. X-Eval consists of two learning stages: the vanilla instruction tuning stage that improves the model's ability to follow evaluation instructions, and an enhanced instruction tuning stage that exploits the connections between fine-grained evaluation aspects to better assess text quality. To support the training of X-Eval, we collect AspectInstruct, the first instruction tuning dataset tailored for multi-aspect NLG evaluation spanning 27 diverse evaluation aspects with 65 tasks. To enhance task diversity, we devise an augmentation strategy that converts human rating annotations into diverse forms of NLG evaluation tasks, including scoring, comparison, ranking, and Boolean question answering. Extensive experiments across three essential categories of NLG tasks: dialogue generation, summarization, and data-to-text coupled with 21 aspects in meta-evaluation, demonstrate that our X-Eval enables even a lightweight language model to achieve a comparable if not higher correlation with human judgments compared to the state-of-the-art NLG evaluators, such as GPT-4.
This paper examines an efficient method for quasi-random sampling of copulas in Monte Carlo computations. Traditional methods, like conditional distribution methods (CDM), have limitations when dealing with high-dimensional or implicit … This paper examines an efficient method for quasi-random sampling of copulas in Monte Carlo computations. Traditional methods, like conditional distribution methods (CDM), have limitations when dealing with high-dimensional or implicit copulas, which refer to those that cannot be accurately represented by existing parametric copulas. Instead, this paper proposes the use of generative models, such as Generative Adversarial Networks (GANs), to generate quasi-random samples for any copula. GANs are a type of implicit generative models used to learn the distribution of complex data, thus facilitating easy sampling. In our study, GANs are employed to learn the mapping from a uniform distribution to copulas. Once this mapping is learned, obtaining quasi-random samples from the copula only requires inputting quasi-random samples from the uniform distribution. This approach offers a more flexible method for any copula. Additionally, we provide theoretical analysis of quasi-Monte Carlo estimators based on quasi-random samples of copulas. Through simulated and practical applications, particularly in the field of risk management, we validate the proposed method and demonstrate its superiority over various existing methods.
Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide … Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks, especially unfamiliar ones. However, current methods for generative script learning rely heavily on well-structured preceding steps described in text and/or images or are limited to a certain domain, resulting in a disparity with real-world user scenarios. To address these limitations, we present a new benchmark challenge – MULTISCRIPT, with two new tasks on task-oriented multimodal script learning: (1) multimodal script generation, and (2) subsequent step prediction. For both tasks, the input consists of a target task name and a video illustrating what has been done to complete the target task, and the expected output is (1) a sequence of structured step descriptions in text based on the demonstration video, and (2) a single text description for the subsequent step, respectively. Built from WikiHow, MULTISCRIPT covers multimodal scripts in videos and text descriptions for over 6,655 human everyday tasks across 19 diverse domains. To establish baseline performance on MULTISCRIPT, we propose two knowledge-guided multimodal generative frameworks that incorporate the task-related knowledge prompted from large language models such as Vicuna. Experimental results show that our proposed approaches significantly improve over the competitive baselines.
The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn’t use the information sufficiently about … The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn’t use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper, which utilizes the information about the distributions of both population and diagnosed cases, to investigate the prevalence more efficiently. The global likelihood sampling, a robust and efficient sampler to draw samples from any probability density function, is used in our sampling strategy, and thus, our new method can automatically adapt to the complicated distributions of population and diagnosed cases. Moreover, the corresponding estimating method is simple, which facilitates the practical implementation. Some recommendations for practical implementation are given. Finally, several simulations and a practical example verify its efficiency.
Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in … Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in its evaluation still significantly lags behind. Existing evaluation benchmarks do not support arbitrarily interleaved images and text for both inputs and outputs, and they only cover a limited number of domains and use cases. Also, current works predominantly use similarity-based metrics which fall short in assessing the quality in open-ended scenarios. To this end, we introduce InterleavedBench, the first benchmark carefully curated for the evaluation of interleaved text-and-image generation. InterleavedBench features a rich array of tasks to cover diverse real-world use cases. In addition, we present InterleavedEval, a strong reference-free metric powered by GPT-4o to deliver accurate and explainable evaluation. We carefully define five essential evaluation aspects for InterleavedEval, including text quality, perceptual quality, image coherence, text-image coherence, and helpfulness, to ensure a comprehensive and fine-grained assessment. Through extensive experiments and rigorous human evaluation, we show that our benchmark and metric can effectively evaluate the existing models with a strong correlation with human judgments surpassing previous reference-based metrics. We also provide substantial findings and insights to foster future research in interleaved generation and its evaluation.
The discovery of novel mechanical metamaterials, whose properties are dominated by their engineered structures rather than chemical composition, is a knowledge-intensive and resource-demanding process. To accelerate the design of novel … The discovery of novel mechanical metamaterials, whose properties are dominated by their engineered structures rather than chemical composition, is a knowledge-intensive and resource-demanding process. To accelerate the design of novel metamaterials, we present MetaScientist, a human-in-the-loop system that integrates advanced AI capabilities with expert oversight with two primary phases: (1) hypothesis generation, where the system performs complex reasoning to generate novel and scientifically sound hypotheses, supported with domain-specific foundation models and inductive biases retrieved from existing literature; (2) 3D structure synthesis, where a 3D structure is synthesized with a novel 3D diffusion model based on the textual hypothesis and refined it with a LLM-based refinement model to achieve better structure properties. At each phase, domain experts iteratively validate the system outputs, and provide feedback and supplementary materials to ensure the alignment of the outputs with scientific principles and human preferences. Through extensive evaluation from human scientists, MetaScientist is able to deliver novel and valid mechanical metamaterial designs that have the potential to be highly impactful in the metamaterial field.
Structured image understanding, such as interpreting tables and charts, requires strategically refocusing across various structures and texts within an image, forming a reasoning sequence to arrive at the final answer. … Structured image understanding, such as interpreting tables and charts, requires strategically refocusing across various structures and texts within an image, forming a reasoning sequence to arrive at the final answer. However, current multimodal large language models (LLMs) lack this multihop selective attention capability. In this work, we introduce ReFocus, a simple yet effective framework that equips multimodal LLMs with the ability to generate "visual thoughts" by performing visual editing on the input image through code, shifting and refining their visual focuses. Specifically, ReFocus enables multimodal LLMs to generate Python codes to call tools and modify the input image, sequentially drawing boxes, highlighting sections, and masking out areas, thereby enhancing the visual reasoning process. We experiment upon a wide range of structured image understanding tasks involving tables and charts. ReFocus largely improves performance on all tasks over GPT-4o without visual editing, yielding an average gain of 11.0% on table tasks and 6.8% on chart tasks. We present an in-depth analysis of the effects of different visual edits, and reasons why ReFocus can improve the performance without introducing additional information. Further, we collect a 14k training set using ReFocus, and prove that such visual chain-of-thought with intermediate information offers a better supervision than standard VQA data, reaching a 8.0% average gain over the same model trained with QA pairs and 2.6% over CoT.
Structured image understanding, such as interpreting tables and charts, requires strategically refocusing across various structures and texts within an image, forming a reasoning sequence to arrive at the final answer. … Structured image understanding, such as interpreting tables and charts, requires strategically refocusing across various structures and texts within an image, forming a reasoning sequence to arrive at the final answer. However, current multimodal large language models (LLMs) lack this multihop selective attention capability. In this work, we introduce ReFocus, a simple yet effective framework that equips multimodal LLMs with the ability to generate "visual thoughts" by performing visual editing on the input image through code, shifting and refining their visual focuses. Specifically, ReFocus enables multimodal LLMs to generate Python codes to call tools and modify the input image, sequentially drawing boxes, highlighting sections, and masking out areas, thereby enhancing the visual reasoning process. We experiment upon a wide range of structured image understanding tasks involving tables and charts. ReFocus largely improves performance on all tasks over GPT-4o without visual editing, yielding an average gain of 11.0% on table tasks and 6.8% on chart tasks. We present an in-depth analysis of the effects of different visual edits, and reasons why ReFocus can improve the performance without introducing additional information. Further, we collect a 14k training set using ReFocus, and prove that such visual chain-of-thought with intermediate information offers a better supervision than standard VQA data, reaching a 8.0% average gain over the same model trained with QA pairs and 2.6% over CoT.
The discovery of novel mechanical metamaterials, whose properties are dominated by their engineered structures rather than chemical composition, is a knowledge-intensive and resource-demanding process. To accelerate the design of novel … The discovery of novel mechanical metamaterials, whose properties are dominated by their engineered structures rather than chemical composition, is a knowledge-intensive and resource-demanding process. To accelerate the design of novel metamaterials, we present MetaScientist, a human-in-the-loop system that integrates advanced AI capabilities with expert oversight with two primary phases: (1) hypothesis generation, where the system performs complex reasoning to generate novel and scientifically sound hypotheses, supported with domain-specific foundation models and inductive biases retrieved from existing literature; (2) 3D structure synthesis, where a 3D structure is synthesized with a novel 3D diffusion model based on the textual hypothesis and refined it with a LLM-based refinement model to achieve better structure properties. At each phase, domain experts iteratively validate the system outputs, and provide feedback and supplementary materials to ensure the alignment of the outputs with scientific principles and human preferences. Through extensive evaluation from human scientists, MetaScientist is able to deliver novel and valid mechanical metamaterial designs that have the potential to be highly impactful in the metamaterial field.
Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in … Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in its evaluation still significantly lags behind. Existing evaluation benchmarks do not support arbitrarily interleaved images and text for both inputs and outputs, and they only cover a limited number of domains and use cases. Also, current works predominantly use similarity-based metrics which fall short in assessing the quality in open-ended scenarios. To this end, we introduce InterleavedBench, the first benchmark carefully curated for the evaluation of interleaved text-and-image generation. InterleavedBench features a rich array of tasks to cover diverse real-world use cases. In addition, we present InterleavedEval, a strong reference-free metric powered by GPT-4o to deliver accurate and explainable evaluation. We carefully define five essential evaluation aspects for InterleavedEval, including text quality, perceptual quality, image coherence, text-image coherence, and helpfulness, to ensure a comprehensive and fine-grained assessment. Through extensive experiments and rigorous human evaluation, we show that our benchmark and metric can effectively evaluate the existing models with a strong correlation with human judgments surpassing previous reference-based metrics. We also provide substantial findings and insights to foster future research in interleaved generation and its evaluation.
Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide … Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks, especially unfamiliar ones. However, current methods for generative script learning rely heavily on well-structured preceding steps described in text and/or images or are limited to a certain domain, resulting in a disparity with real-world user scenarios. To address these limitations, we present a new benchmark challenge – MULTISCRIPT, with two new tasks on task-oriented multimodal script learning: (1) multimodal script generation, and (2) subsequent step prediction. For both tasks, the input consists of a target task name and a video illustrating what has been done to complete the target task, and the expected output is (1) a sequence of structured step descriptions in text based on the demonstration video, and (2) a single text description for the subsequent step, respectively. Built from WikiHow, MULTISCRIPT covers multimodal scripts in videos and text descriptions for over 6,655 human everyday tasks across 19 diverse domains. To establish baseline performance on MULTISCRIPT, we propose two knowledge-guided multimodal generative frameworks that incorporate the task-related knowledge prompted from large language models such as Vicuna. Experimental results show that our proposed approaches significantly improve over the competitive baselines.
This paper examines an efficient method for quasi-random sampling of copulas in Monte Carlo computations. Traditional methods, like conditional distribution methods (CDM), have limitations when dealing with high-dimensional or implicit … This paper examines an efficient method for quasi-random sampling of copulas in Monte Carlo computations. Traditional methods, like conditional distribution methods (CDM), have limitations when dealing with high-dimensional or implicit copulas, which refer to those that cannot be accurately represented by existing parametric copulas. Instead, this paper proposes the use of generative models, such as Generative Adversarial Networks (GANs), to generate quasi-random samples for any copula. GANs are a type of implicit generative models used to learn the distribution of complex data, thus facilitating easy sampling. In our study, GANs are employed to learn the mapping from a uniform distribution to copulas. Once this mapping is learned, obtaining quasi-random samples from the copula only requires inputting quasi-random samples from the uniform distribution. This approach offers a more flexible method for any copula. Additionally, we provide theoretical analysis of quasi-Monte Carlo estimators based on quasi-random samples of copulas. Through simulated and practical applications, particularly in the field of risk management, we validate the proposed method and demonstrate its superiority over various existing methods.
Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended … Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation toward the interpretable machine learning, node importance in social network, attribution models, etc. However, it could be very expensive to compute the Shapley value. Specifically, in a d-player coalition game, calculating a Shapley value requires the evaluation of d! or 2d marginal contribution values, depending on whether we are taking the permutation or combination formulation of the Shapley value. Hence, it becomes infeasible to calculate the Shapley value when d is reasonably large. A common remedy is to take a random sample of the permutations to surrogate for the complete list of permutations. We find an advanced sampling scheme can be designed to yield much more accurate estimation of the Shapley value than the simple random sampling (SRS). Our sampling scheme is based on combinatorial structures in the field of design of experiments (DOE), particularly the order-of-addition experimental designs for the study of how the orderings of components would affect the output. We show that the obtained estimates are unbiased, and can sometimes deterministically recover the original Shapley value. Both theoretical and simulations results show that our DOE-based sampling scheme outperforms SRS in terms of estimation accuracy. Surprisingly, it is also slightly faster than SRS. Lastly, real data analysis is conducted for the C. elegans nervous system and the 9/11 terrorist network. Supplementary materials for this article are available online.
The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn’t use the information sufficiently about … The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn’t use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper, which utilizes the information about the distributions of both population and diagnosed cases, to investigate the prevalence more efficiently. The global likelihood sampling, a robust and efficient sampler to draw samples from any probability density function, is used in our sampling strategy, and thus, our new method can automatically adapt to the complicated distributions of population and diagnosed cases. Moreover, the corresponding estimating method is simple, which facilitates the practical implementation. Some recommendations for practical implementation are given. Finally, several simulations and a practical example verify its efficiency.
Combinations of drugs are now ubiquitous in treating complex diseases such as cancer and HIV due to their potential for enhanced efficacy and reduced side effects. The traditional combination experiments … Combinations of drugs are now ubiquitous in treating complex diseases such as cancer and HIV due to their potential for enhanced efficacy and reduced side effects. The traditional combination experiments of drugs focus primarily on the dose effects of the constituent drugs. However, with the doses of drugs remaining unchanged, different sequences of drug administration may also affect the efficacy endpoint. Such drug effects shall be called as order effects. The common order‐effect linear models are usually inadequate for analyzing combination experiments due to the nonlinear relationships and complex interactions among drugs. In this article, we propose a random field model for order‐effect modeling. This model is flexible, allowing nonlinearities, and interaction effects to be incorporated with a small number of model parameters. Moreover, we propose a subtle experimental design that will collect good quality data for modeling the order effects of drugs with a reasonable run size. A real‐data analysis and simulation studies are given to demonstrate that the proposed design and model are effective in predicting the optimal drug sequences in administration.
Drawing samples from a target distribution is essential for statistical computations when the analytical solution is infeasible. Many existing sampling methods may be easy to fall into the local mode … Drawing samples from a target distribution is essential for statistical computations when the analytical solution is infeasible. Many existing sampling methods may be easy to fall into the local mode or strongly depend on the proposal distribution when the target distribution is complicated. In this article, the Global Likelihood Sampler (GLS) is proposed to tackle these problems and the GL bootstrap is used to assess the Monte Carlo error. GLS takes the advantage of the randomly shifted low-discrepancy point set to sufficiently explore the structure of the target distribution. It is efficient for multimodal and high-dimensional distributions and easy to implement. It is shown that the empirical cumulative distribution function of the samples uniformly converges to the target distribution under some conditions. The convergence for the approximate sampling distribution of the sample mean based on the GL bootstrap is also obtained. Moreover, numerical experiments and a real application are conducted to show the effectiveness, robustness, and speediness of GLS compared with some common methods. It illustrates that GLS can be a competitive alternative to existing sampling methods. Supplementary materials for this article are available online.
Sequential Latin hypercube designs have recently received great attention for computer experiments. Much of the work has been restricted to invariant spaces. The related systematic construction methods are inflexible while … Sequential Latin hypercube designs have recently received great attention for computer experiments. Much of the work has been restricted to invariant spaces. The related systematic construction methods are inflexible while algorithmic methods are ineffective for large designs. For such designs in space contraction, systematic construction methods have not been investigated yet. This paper proposes a new method for constructing sequential Latin hypercube designs via good lattice point sets in a variety of experimental spaces. These designs are called sequential good lattice point sets. Moreover, we provide fast and efficient approaches for identifying the (nearly) optimal sequential good lattice point sets under a given criterion. Combining with the linear level permutation technique, we obtain a class of asymptotically optimal sequential Latin hypercube designs in invariant spaces where the $L_1$-distance in each stage is either optimal or asymptotically optimal. Numerical results demonstrate that the sequential good lattice point set has a better space-filling property than the existing sequential Latin hypercube designs in the invariant space. It is also shown that the sequential good lattice point sets have less computational complexity and more adaptability.
Biomedical entity linking and event extraction are two crucial tasks to support text understanding and retrieval in the biomedical domain. These two tasks intrinsically benefit each other: entity linking disambiguates … Biomedical entity linking and event extraction are two crucial tasks to support text understanding and retrieval in the biomedical domain. These two tasks intrinsically benefit each other: entity linking disambiguates the biomedical concepts by referring to external knowledge bases and the domain knowledge further provides additional clues to understand and extract the biological processes, while event extraction identifies a key trigger and entities involved to describe each biological process which also captures the structural context to better disambiguate the biomedical entities. However, previous research typically solves these two tasks separately or in a pipeline, leading to error propagation. What's more, it's even more challenging to solve these two tasks together as there is no existing dataset that contains annotations for both tasks. To solve these challenges, we propose joint biomedical entity linking and event extraction by regarding the event structures and entity references in knowledge bases as latent variables and updating the two task-specific models in a hard Expectation-Maximization (EM) fashion: (1) predicting the missing variables for each partially annotated dataset based on the current two task-specific models, and (2) updating the parameters of each model on the corresponding pseudo completed dataset. Experimental results on two benchmark datasets: Genia 2011 for event extraction and BC4GO for entity linking, show that our joint framework significantly improves the model for each individual task and outperforms the strong baselines for both tasks. We will make the code and model checkpoints publicly available once the paper is accepted.
We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a … We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also described with a text description, a visual image and a set of attributes and values. To support this research, we construct AMELI, a large-scale dataset consisting of 18,472 reviews and 35,598 products. To establish baseline performance on AMELI, we experiment with the current state-of-the-art multimodal entity linking approaches and our enhanced attribute-aware model and demonstrate the importance of incorporating the attribute information into the entity linking process. To be best of our knowledge, we are the first to build benchmark dataset and solutions for the attribute-aware multimodal entity linking task. Datasets and codes will be made publicly available.
Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. However, confined by its inherent single-pass and sequential generation process, CoT heavily relies on … Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. However, confined by its inherent single-pass and sequential generation process, CoT heavily relies on the initial decisions, causing errors in early steps to accumulate and impact the final answers. In contrast, humans adopt recursive thinking when tackling complex reasoning problems, i.e., iteratively breaking the original problem into approachable sub-problems and aggregating their answers to resolve the original one. Inspired by the human cognitive process, we propose SOCRATIC QUESTIONING, a divide-and-conquer style algorithm that mimics the recursive thinking process. Specifically, SOCRATIC QUESTIONING leverages large language models to raise and answer sub-questions until collecting enough information to tackle the original question. Unlike CoT, SOCRATIC QUESTIONING explicitly navigates the thinking space, stimulates effective recursive thinking, and is more robust towards errors in the thinking process. Extensive experiments on several complex reasoning tasks, including MMLU, MATH, LogiQA, and visual question-answering demonstrate significant performance improvements over the state-of-the-art prompting methods, such as CoT, and Tree-of-Thought. The qualitative analysis clearly shows that the intermediate reasoning steps elicited by SOCRATIC QUESTIONING are similar to humans' recursively thinking process of complex reasoning problems.
Class-incremental learning (CIL) aims to develop a learning system that can continually learn new classes from a data stream without forgetting previously learned classes. When learning classes incrementally, the classifier … Class-incremental learning (CIL) aims to develop a learning system that can continually learn new classes from a data stream without forgetting previously learned classes. When learning classes incrementally, the classifier must be constantly updated to incorporate new classes, and the drift in decision boundary may lead to severe forgetting. This fundamental challenge, however, has not yet been studied extensively, especially in the setting where no samples from old classes are stored for rehearsal. In this paper, we take a closer look at how the drift in the classifier leads to forgetting, and accordingly, design four simple yet (super-) effective solutions to alleviate the classifier drift: an Individual Classifiers with Frozen Feature Extractor (ICE) framework where we individually train a classifier for each learning session, and its three variants ICE-PL, ICE-O, and ICE-PL&O which further take the logits of previously learned classes from old sessions or a constant logit of an Other class as a constraint to the learning of new classifiers. Extensive experiments and analysis on 6 class-incremental information extraction tasks demonstrate that our solutions, especially ICE-O, consistently show significant improvement over the previous state-of-the-art approaches with up to 44.7% absolute F-score gain, providing a strong baseline and insights for future research on class-incremental learning.
Class-incremental learning (CIL) aims to develop a learning system that can continually learn new classes from a data stream without forgetting previously learned classes. When learning classes incrementally, the classifier … Class-incremental learning (CIL) aims to develop a learning system that can continually learn new classes from a data stream without forgetting previously learned classes. When learning classes incrementally, the classifier must be constantly updated to incorporate new classes, and the drift in decision boundary may lead to severe forgetting. This fundamental challenge, however, has not yet been studied extensively, especially in the setting where no samples from old classes are stored for rehearsal. In this paper, we take a closer look at how the drift in the classifier leads to forgetting, and accordingly, design four simple yet (super-) effective solutions to alleviate the classifier drift: an Individual Classifiers with Frozen Feature Extractor (ICE) framework where we individually train a classifier for each learning session, and its three variants ICE-PL, ICE-O, and ICE-PL&O which further take the logits of previously learned classes from old sessions or a constant logit of an Other class as constraint to the learning of new classifiers. Extensive experiments and analysis on 6 class-incremental information extraction tasks demonstrate that our solutions, especially ICE-O, consistently show significant improvement over the previous state-of-the-art approaches with up to 44.7% absolute F-score gain, providing a strong baseline and insights for future research on class-incremental learning.
Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended … Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation towards the interpretable machine learning, node importance in social network, attribution models, etc. However, its heavy computational burden has been long recognized but rarely investigated. Specifically, in a $d$-player coalition game, calculating a Shapley value requires the evaluation of $d!$ or $2^d$ marginal contribution values, depending on whether we are taking the permutation or combination formulation of the Shapley value. Hence it becomes infeasible to calculate the Shapley value when $d$ is reasonably large. A common remedy is to take a random sample of the permutations to surrogate for the complete list of permutations. We find an advanced sampling scheme can be designed to yield much more accurate estimation of the Shapley value than the simple random sampling (SRS). Our sampling scheme is based on combinatorial structures in the field of design of experiments (DOE), particularly the order-of-addition experimental designs for the study of how the orderings of components would affect the output. We show that the obtained estimates are unbiased, and can sometimes deterministically recover the original Shapley value. Both theoretical and simulations results show that our DOE-based sampling scheme outperforms SRS in terms of estimation accuracy. Surprisingly, it is also slightly faster than SRS. Lastly, real data analysis is conducted for the C. elegans nervous system and the 9/11 terrorist network.
Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide … Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks, especially unfamiliar ones. However, current methods for generative script learning rely heavily on well-structured preceding steps described in text and/or images or are limited to a certain domain, resulting in a disparity with real-world user scenarios. To address these limitations, we present a new benchmark challenge -- MultiScript, with two new tasks on task-oriented multimodal script learning: (1) multimodal script generation, and (2) subsequent step prediction. For both tasks, the input consists of a target task name and a video illustrating what has been done to complete the target task, and the expected output is (1) a sequence of structured step descriptions in text based on the demonstration video, and (2) a single text description for the subsequent step, respectively. Built from WikiHow, MultiScript covers multimodal scripts in videos and text descriptions for over 6,655 human everyday tasks across 19 diverse domains. To establish baseline performance on MultiScript, we propose two knowledge-guided multimodal generative frameworks that incorporate the task-related knowledge prompted from large language models such as Vicuna. Experimental results show that our proposed approaches significantly improve over the competitive baselines.
Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may … Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may require the evaluator to generalize to any given evaluation aspect even if it's absent during training. In this paper, we introduce X-Eval, a two-stage instruction tuning framework to evaluate the text in both seen and unseen aspects customized by end users. X-Eval consists of two learning stages: the vanilla instruction tuning stage that improves the model's ability to follow evaluation instructions, and an enhanced instruction tuning stage that exploits the connections between fine-grained evaluation aspects to better assess text quality. To support the training of X-Eval, we collect AspectInstruct, the first instruction tuning dataset tailored for multi-aspect NLG evaluation spanning 27 diverse evaluation aspects with 65 tasks. To enhance task diversity, we devise an augmentation strategy that converts human rating annotations into diverse forms of NLG evaluation tasks, including scoring, comparison, ranking, and Boolean question answering. Extensive experiments across three essential categories of NLG tasks: dialogue generation, summarization, and data-to-text coupled with 21 aspects in meta-evaluation, demonstrate that our X-Eval enables even a lightweight language model to achieve a comparable if not higher correlation with human judgments compared to the state-of-the-art NLG evaluators, such as GPT-4.
Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. However, confined by its inherent single-pass and sequential generation process, CoT heavily relies on … Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. However, confined by its inherent single-pass and sequential generation process, CoT heavily relies on the initial decisions, causing errors in early steps to accumulate and impact the final answers. In contrast, humans adopt recursive thinking when tackling complex reasoning problems, i.e. iteratively breaking the original problem into approachable sub-problems and aggregating their answers to resolve the original one. Inspired by the human cognitive process, we propose SOCRATIC QUESTIONING, a divide-and-conquer style algorithm that mimics the recursive thinking process. Specifically, SOCRATIC QUESTIONING leverages large language models to raise and answer sub-questions until collecting enough information to tackle the original question. Unlike CoT, SOCRATIC QUESTIONING explicitly navigates the thinking space, stimulates effective recursive thinking, and is more robust towards errors in the thinking process. Extensive experiments on several complex reasoning tasks, including MMLU, MATH, LogiQA, and visual question-answering demonstrate significant performance improvements over the state-of-the-art prompting methods, such as CoT, and Tree-of-Thought. The qualitative analysis clearly shows that the intermediate reasoning steps elicited by SOCRATIC QUESTIONING are similar to humans’ recursively thinking process of complex reasoning problems.
The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn’t use the information sufficiently about … The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn’t use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper, which utilizes the information about the distributions of both population and diagnosed cases, to investigate the prevalence more efficiently. The global likelihood sampling, a robust and efficient sampler to draw samples from any probability density function, is used in our sampling strategy, and thus, our new method can automatically adapt to the complicated distributions of population and diagnosed cases. Moreover, the corresponding estimating method is simple, which facilitates the practical implementation. Some recommendations for practical implementation are given. Finally, several simulations and a practical example verify its efficiency.
The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn't use the information sufficiently about … The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn't use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper, which utilises the information about the distributions of both population and diagnosed cases, to investigate the prevalence more efficiently. The global likelihood sampling, a robust and efficient sampler to draw samples from any probability density function, is used in our sampling strategy, and thus, our new method can automatically adapt to the complicated distributions of population and cases. Moreover, the corresponding estimating method is simple, which facilitates the practical implementation. Some recommendations for practical implementation are given. Finally, several simulations and a practical example verified its efficiency.
Lifelong event detection aims to incrementally update a model with new event types and data while retaining the capability on previously learned old types. One critical challenge is that the … Lifelong event detection aims to incrementally update a model with new event types and data while retaining the capability on previously learned old types. One critical challenge is that the model would catastrophically forget old types when continually trained on new data. In this paper, we introduce Episodic Memory Prompts (EMP) to explicitly preserve the learned task-specific knowledge. Our method adopts continuous prompt for each task and they are optimized to instruct the model prediction and learn event-specific representation. The EMPs learned in previous tasks are carried along with the model in subsequent tasks, and can serve as a memory module that keeps the old knowledge and transferring to new tasks. Experiment results demonstrate the effectiveness of our method. Furthermore, we also conduct a comprehensive analysis of the new and old event types in lifelong learning.
In this paper, we investigate function computation problems under different secure conditions over a network with multiple source nodes and a single sink node which desires a function of all … In this paper, we investigate function computation problems under different secure conditions over a network with multiple source nodes and a single sink node which desires a function of all source messages without error. A wiretapper has access to some edges of the network. Based on different practical requirements, we consider two secure conditions named as secure and user secure respectively. The main parameter concerned here is the computing rate, which is the average times of the target function that can be computed securely or user securely without error for one use of the network. In the secure case, a new upper bound which is tighter than the previous one is provided for arithmetic sum functions and arbitrary networks. Moreover, we show that the improved upper bound is strictly tight for tree-like networks. In the user secure case, we give a sufficient and necessary condition for the existence of user secure network codes and obtain an upper bound for the computation capacity.
Several network communication problems are highly related such as coded caching and distributed computation. The centralized coded caching focuses on reducing the network burden in peak times in a wireless … Several network communication problems are highly related such as coded caching and distributed computation. The centralized coded caching focuses on reducing the network burden in peak times in a wireless network system and the coded distributed computation studies the tradeoff between computation and communication in distributed system. In this paper, motivated by the study of the only rainbow $3$-term arithmetic progressions set, we propose a unified framework for constructing coded caching schemes. This framework builds bridges between coded caching schemes and lots of combinatorial objects due to the freedom of the choices of families and operations. We prove that any scheme based on a placement delivery array (PDA) can be represented by a rainbow scheme under this framework and lots of other known schemes can also be included in this framework. Moreover, we also present a new coded caching scheme with linear subpacketization and near constant rate using the only rainbow $3$-term arithmetic progressions set. Next, we modify the framework to be applicable to the distributed computing problem. We present a new transmission scheme in the shuffle phase and show that in certain cases it could have a lower communication load than the schemes based on PDAs or resolvable designs with the same number of files.
Nested space-filling designs are nested designs with attractive low-dimensional stratification. Such designs are gaining popularity in statistics, applied mathematics and engineering. Their applications include multi-fidelity computer models, stochastic optimization problems, … Nested space-filling designs are nested designs with attractive low-dimensional stratification. Such designs are gaining popularity in statistics, applied mathematics and engineering. Their applications include multi-fidelity computer models, stochastic optimization problems, multi-level fitting of nonparametric functions, and linking parameters. We propose methods for constructing several new classes of nested space-filling designs. These methods are based on a new group projection and other algebraic techniques. The constructed designs can accommodate a nested structure with an arbitrary number of layers and are more flexible in run size than the existing families of nested space-filling designs. As a byproduct, the proposed methods can also be used to obtain sliced space-filling designs that are appealing for conducting computer experiments with both qualitative and quantitative factors.
General minimum lower order confounding (GMC) is a newly proposed design criterion that aims at keeping the lower order effects unaliased with one another to the extent possible. This paper … General minimum lower order confounding (GMC) is a newly proposed design criterion that aims at keeping the lower order effects unaliased with one another to the extent possible. This paper shows that for 5N/16 < n ≤ N/2, 9N/32 < n ≤ 5N/16, and 17N/64 < n ≤ 9N/32, all GMC designs with N runs and n two-level factors are projections of maximal designs with N/2, 5N/16, and 9N/32 factors, respectively. Furthermore, it provides immediate approaches to construct- ing these GMC designs from the respective maximal designs; these approaches can produce many more GMC designs than the existing computer search method.
Supersaturated design (SSD) has received much recent interest because of its potential in factor screening experiments. In this paper, we provide equivalent conditions for two columns to be fully aliased … Supersaturated design (SSD) has received much recent interest because of its potential in factor screening experiments. In this paper, we provide equivalent conditions for two columns to be fully aliased and consequently propose methods for constructing E(fNOD)- and χ2-optimal mixed-level SSDs without fully aliased columns, via equidistant designs and difference matrices. The methods can be easily performed and many new optimal mixed-level SSDs have been obtained. Furthermore, it is proved that the nonorthogonality between columns of the resulting design is well controlled by the source designs. A rather complete list of newly generated optimal mixed-level SSDs are tabulated for practical use.
Fractional factorial split-plot (FFSP) designs have received much attention in recent years. In this article, the matrix representation for FFSP designs with multi-level factors is first developed, which is an … Fractional factorial split-plot (FFSP) designs have received much attention in recent years. In this article, the matrix representation for FFSP designs with multi-level factors is first developed, which is an extension of the one proposed by Bingham and Sitter (1999b Bingham , D. , Sitter , R. R. ( 1999b ). Some theoretical results for fractional factorial split-plot designs . Ann. Statist. 27 : 1240 – 1255 .[Crossref] , [Google Scholar]) for the two-level case. Based on this representation, periodicity results of maximum resolution and minimum aberration for such designs are derived. Differences between FFSP designs with multi-level factors and those with two-level factors are highlighted.